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Abstract: Multivariate functions are typically governed by aniso- 
tropic features such as edges in images or shock fronts in solutions 
of transport-dominated equations. One major goal both for the pur- 
pose of compression as well as for an efficient analysis is the pro- 
vision of optimally sparse approximations of such functions. Re- 
cently, cartoon-like images were introduced in 2D and 3D as a suit- 
able model class, and approximation properties were measured by 
considering the decay rate of the error of the best N-t&un approx- 
imation. Shearlet systems are to date the only representation sys- 
tem, which provide optimally sparse approximations of this model 
class in 2D as well as 3D. Even more, in contrast to all other di- 
rectional representation systems, a theory for compactly supported 
shearlet frames was derived which moreover also satisfy this opti- 
mality benchmark. This chapter shall serve as an introduction to 
and a survey about sparse approximations of cartoon-like images by 
band-limited and also compactly supported shearlet frames as well 
as a reference for the state-of-the-art of this research field. 

1 Introduction 

Scientists face a rapidly growing deluge of data, which requires highly sophisticated 
methodologies for analysis and compression. Simultaneously, the complexity of the 
data is increasing, evidenced in particular by the observation that data becomes in- 
creasingly high-dimensional. One of the most prominent features of data are singu- 
larities which is justified, for instance, by the observation from computer visionists 
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that the human eye is most sensitive to smooth geometric areas divided by sharp 
edges. Intriguingly, already the step from univariate to multivariate data causes a 
significant change in the behavior of singularities. Whereas one-dimensional (ID) 
functions can only exhibit point singularities, singularities of two-dimensional (2D) 
functions can already be of both point as well as curvilinear type. Thus, in contrast 
to isotropic features - point singularities -, suddenly anisotropic features - curvi- 
linear singularities - are possible. And, in fact, multivariate functions are typically 
governed by anisotropic phenomena. Think, for instance, of edges in digital images 
or evolving shock fronts in solutions of transport-dominated equations. These two 
exemplary situations also show that such phenomena occur even for both explicitly 
as well as implicitly given data. 

One major goal both for the purpose of compression as well as for an effi- 
cient analysis is the introduction of representation systems for 'good' approxima- 
tion of anisotropic phenomena, more precisely, of multivariate functions governed 
by anisotropic features. This raises the following fundamental questions: 

(PI) What is a suitable model for functions governed by anisotropic features? 

(P2) How do we measure 'good' approximation and what is a benchmark for op- 
timality? 

(P3) Is the step from ID to 2D already the crucial step or how does this framework 
scale with increasing dimension? 

(P4) Which representation system behaves optimally? 

Let us now first debate these questions on a higher and more intuitive level, and 
later on delve into the precise mathematical formalism. 

1.1 Choice of Model for Anisotropic Features 

Each model design has to face the trade-off between closeness to the true situa- 
tion versus sufficient simplicity to enable analysis of the model. The suggestion 
of a suitable model for functions governed by anisotropic features in ^9J solved 
this problem in the following way. As a model for an image, it first of all re- 
quires the L^(M^) functions serving as a model to be supported on the unit square 
[0, 1]^. These functions shall then consist of the minimal number of smooth parts, 
namely two. To avoid artificial problems with a discontinuity ending at the bound- 
ary of [0, 1]^, the boundary curve of one of the smooth parts is entirely contained 
in (0, 1)^. It now remains to decide upon the regularity of the smooth parts of 
the model functions and of the boundary curve, which were chosen to both be C^. 
Thus, concluding, a possible suitable model for functions governed by anisotropic 
features are 2D functions which are supported on [0, 1]^ and apart from a closed 
discontinuity curve; these are typically referred to as cartoon-like images (cf. 
chapter |[T|). This provides an answer to (PI). Extensions of this 2D model to 
piecewise smooth curves were then suggested in Q, and extensions to 3D as well 
as to different types of regularity were introduced in pTj[T5| . 
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1.2 Measure for Sparse Approximation and Optimality 

The quality of the performance of a representation system with respect to cartoon- 
like images is typically measured by taking a non-linear approximation viewpoint. 
More precisely, given a cartoon- like image and a representation system which forms 
an orthonormal basis, the chosen measure is the asymptotic behavior of the error 
of the best A^-term (non-linear) approximation in the number of terms A^^. This 
intuitively measures how fast the f- norm of the tail of the expansion decays as more 
and more terms are used for the approximation. A slight subtlety has to be observed 
if the representation system does not form an orthonormal basis, but a frame. In this 
case, the A^-term approximation using the A'^ largest coefficients is considered which, 
in case of an orthonormal basis, is the same as the best A^-term approximation, but 
not in general. The term 'optimally sparse approximation' is then awarded to those 
representation systems which deliver the fastest possible decay rate in A'^ for all 
cartoon-like images, where we consider log-factors as negligible, thereby providing 
an answer to (P2). 

1.3 Why is 3D the Crucial Dimension? 

We already identified the step from ID to 2D as crucial for the appearance of 
anisotropic features at all. Hence one might ask: Is is sufficient to consider only 
the 2D situation, and higher dimensions can be treated similarly? Or: Does each 
dimension causes its own problems? To answer these questions, let us consider the 
step from 2D to 3D which shows a curious phenomenon. A 3D function can exhibit 
point (= OD), curvilinear (= ID), and surface (= 2D) singularities. Thus, suddenly 
anisotropic features appear in two different dimensions: As one-dimensional and 
as two-dimensional features. Hence, the 3D situation has to be analyzed with par- 
ticular care. It is not at all clear whether two different representation systems are 
required for optimally approximating both types of anisotropic features simultane- 
ously, or whether one system will suffice. This shows that the step from 2D to 3D 
can justifiably be also coined 'crucial'. Once it is known how to handle anisotropic 
features of different dimensions, the step from 3D to 4D can be dealt with in a sim- 
ilar way as also the extension to even higher dimensions. Thus, answering (P3), we 
conclude that the two crucial dimensions are 2D and 3D with higher dimensional 
situations deriving from the analysis of those. 

1.4 Performance of Shearlets and Other Directional Systems 

Within the framework we just briefly outlined, it can be shown that wavelets do not 
provide optimally sparse approximations of cartoon-like images. This initiated a 
flurry of activity within the applied harmonic analysis community with the aim to 
develop so-called directional representation systems which satisfy this benchmark, 
certainly besides other desirable properties depending in the application at hand. In 
2004, Candes and Donoho were the first to introduce with the tight curvelet frames 
a directional representation system which provides provably optimally sparse ap- 
proximations of cartoon-like images in the sense we discussed. One year later. 
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contourlets were introduced by Do and Vetterli [|7|, which similarly derived an op- 
timal approximation rate. The first analysis of the performance of (band-limited) 
shearlet frames was undertaken by Guo and Labate in [lO]], who proved that these 
shearlets also do satisfy this benchmark. In the situation of (band-limited) shearlets 
the analysis was then driven even further, and very recently Guo and Labate proved 
a similar result for 3D cartoon-like images which in this case are defined as a func- 
tion which is apart from a discontinuity surface, i.e., focusing on only one of 
the types of anisotropic features we are facing in 3D. 



1.5 Band-Limited Versus Compactly Supported Systems 

The results mentioned in the previous subsection only concerned band-limited sys- 
tems. Even in the contourlet case, although compactly supported contourlets seem 
to be included, the proof for optimal sparsity only works for band-limited gener- 
ators due to the requirement of infinite directional vanishing moments. However, 
for various applications compactly supported generators are inevitable, wherefore 
already in the wavelet case the introduction of compactly supported wavelets was 
a major advance. Prominent examples of such applications are imaging sciences, 
when an image might need to be denoised while avoiding a smoothing of the edges, 
or in the theory of partial differential equations as a generating system for a trial 
space in order to ensure fast computational realizations. 

So far, shearlets are the only system, for which a theory for compactly supported 
generators has been developed and compactly supported shearlet frames have been 
constructed [13|, see also the survey paper [jT6|. It should though be mentioned 



that these frames are somehow close to being tight, but at this point it is not clear 
whether also compactly supported tight shearlet frames can be constructed. In- 
terestingly, it was proved in p7| that this class of shearlet frames also delivers 
optimally sparse approximations of the 2D cartoon-like image model class with a 
very different proof than [ |T0[ now adapted to the particular nature of compactly 
supported generators. And with [15] the 3D situation is now also fully understood, 
even taking the two different types of anisotropic features - curvilinear and surface 
singularities - into account. 



1.6 Outline 

In Sect. [2} we introduce the 2D and 3D cartoon-like image model class. Optimal- 
ity of sparse approximations of this class are then discussed in Sect. |3] Sect. |4] 
is concerned with the introduction of 3D shearlet systems with both band-limited 
and compactly supported generators, which are shown to provide optimally sparse 
approximations within this class in the final Sect. |5] 



2 Cartoon-like Image Class 

We start by making the in the introduction of this chapter already intuitively derived 
definition of cartoon-like images mathematically precise. We start with the most 
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basic definition of this class which was also historically first stated in [|9|. We allow 
ourselves to state this together with its 3D version from [ 11 1 by remarking that d 
could be either d = 2or d = 3. 

For fixed /i > 0, the class of cartoon-like image shall be the set of 

functions / : M"^ — )• C of the form 

f = fo + fiXB, 

where B C [0, 1]'' and e C^iR'^) with supp/o C [0, 1]'' and ||/,- 11^2 < II for each / = 
0, 1. For dimension J = 2, we assume that dB is a closed C^-curve with curvature 
bounded by v, and, for d = 3, the discontinuity dB shall be a closed C^-surface 
with principal curvatures bounded by v. An indiscriminately chosen cartoon-like 
function f = Xb, where the discontinuity surface dB is a deformed sphere in R^, is 
depicted in Fig.[T] 



0.75 - 



0.50 - 



0.25 - 




Figure 1: A simple cartoon-like image f = Xb ^ ^lO^^) ^^^^ L = 1 for dimension 
d = 3, where the discontinuity surface dB is a deformed sphere. 



Since 'objects' in images often have sharp comers, in Q for 2D and in [15J for 
3D also less regular images were allowed, where dB is only assumed to be piece- 
wise C^-smooth. We note that this viewpoint is also essential for being able to ana- 
lyze the behavior of a system with respect to the two different types of anisotropic 



features appearing in 3D; see the discussion in Subsection 1.3 Letting L G N de- 
note the number of pieces, we speak of the extended class of cartoon-like images 
SliW^) as consisting of cartoon-like images having C^-smoothness apart from a 
piecewise discontinuity curve in the 2D setting and a piecewise discontinuity 
surface in the 3D setting. Indeed, in the 3D setting, besides the discontinuity 
surfaces, this model exhibits curvilinear singularities as well as point singular- 
ities, e.g., the cartoon-like image f = Xb Fig. |2] exhibits a discontinuity surface 
dB C R? consisting of three C^-smooth surfaces with point and curvilinear singu- 
larities where these surfaces meet. 
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The model in [ 15 1 goes even one step further and considers a different regularity 
for the smooth parts, say being in , and for the smooth pieces of the discontinuity, 
say being in C" with 1 < 0( < j8 < 2. This very general class of cartoon-like images 
is then denoted by ^^^(M^^), with the agreement that = lO^'') = 

/3=2. 

For the purpose of clarity, in the sequel we will focus on the first most basic 
cartoon-like model where a = /3 = 2, and add hints on generalizations when ap- 
propriate (in particular, in Sect.|5.2.4l). 



3 Sparse Approximations 

After having clarified the model situation, we will now discuss which measure for 
the accuracy of approximation by representation systems we choose, and what op- 
timality means in this case. 

3.1 (Non-Linear) N-ierm Approximations 

Let ^ denote a given class of elements in a separable Hilbert space with norm 

1 /2 

II ■ II = (■, •) ' and 4> = {<l>i)i<^i a dictionary for i.e., span4> = M', with indexing 
set /. The dictionary 4> plays the role of our representation system. Later ^ will be 
chosen to be the class of cartoon-like images and ^» a shearlet frame, but for now 
we will assume this more general setting. We now seek to approximate each single 
element of ^ with elements from ^> by 'few' terms of this system. Approximation 
theory provides us with the concept of best A^-term approximation which we now 
introduce; for a general introduction to approximation theory, we refer to Q. 
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For this, let / G be arbitrarily chosen. Since ^» is a complete system, for any 
£ > there exists a finite linear combination of elements from 4> of the form 

g = ^ Ci^i with F C I finite, i.e., # |F| < oo 

such that II/ — ^11 < e. Moreover, if 4> is a frame with countable indexing set /, 
there exists a sequence (c/)/^/ G £2(1) such that the representation 

/ = 

holds with convergence in the Hilbert space norm ||-||. The reader should notice 
that, if ^» does not form a basis, this representation of / is certainly not the only 
possible one. Letting now A^^ G N, we aim to approximate / by only A^^ terms of 4>, 
i.e., by 

Y,c4i with/^ c/,#|/^| =A^, 

which is termed N-term approximation to /. This approximation is typically non- 
linear in the sense that if ff^ is an A^-term approximation to / with indices Ip^ and 
gN is an A/^-term approximation to some g with indices 7/^, then fN + gN is only 
an N-term approximation to / + g in case Ij^ = J^. 

But certainly we would like to pick the 'best' approximation with the accuracy 
of approximation measured in the Hilbert space norm. We define the best N-term 
approximation to / by the A^^-term approximation 



A' 



which satisfies that, for all I/^ C I,# = N, and for all scalars {ciji^j, 

\\f-fN\\< 

Let us next discuss the notion of best A^^-term approximation for the special cases 
of 4> forming an orthornomal basis, a tight frame, and a general frame alongside an 
error estimate for the accuracy of this approximation. 



3.1.1 Orthonormal Bases 

Let 4> be an orthonormal basis for J^. In this case, we can actually write down the 
best A^-term approximation fpj = Y.ieiN f- Since in this case 

and this representation is unique, we obtain 



ll/-/vil^= Y^if^^d^i-Y^ci^i 
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i&In i^I\lN 
= \\i{f^<l>i)- Ci) ieiN 1 1 i'2 + 1 1 ( (/, 0;) ) 1 1 £2 ■ 

The first term || ((/, — Ci)iaN \\e^ be minimized by choosing c/ = (/, for 
all i e In- And the second term ||((/, 0()),e/\/^||^2 can be minimized by choosing 
I]\f to be the indices of the A'^ largest coefficients (/, 0,) in magnitude. Notice that 
this does not uniquely determine since some coefficients (/, might have the 
same magnitude. But it characterizes the set of best A^-term approximations to some 
f precisely. Even more, we have complete control of the error of best A'^-term 
approximation by 

\\f-fN\\ = \\{{f,mei\iJfi. (3.1) 



3.1.2 Tight Frames 

Assume now that 4> constitutes a tight frame with bound A = 1 for J^. In this 
situation, we still have 

but this expansion is now not unique anymore. Moreover, the frame elements are 
not orthogonal. Both conditions prohibit an analysis of the error of best A'^-term ap- 
proximation as in the previously considered situation of an orthonormal basis. And 
in fact, examples can be provided to show that selecting the N largest coefficients 
(/, ^i) in magnitude does not always lead to the best A^-term approximation, but 
merely to an A^-term approximation. To be able to still analyze the approximation 
error, one typically - as will be also our choice in the sequel - chooses the A^-term 
approximation provided by the indices 1^ associated with the A^ largest coefficients 
(/, ^i) in magnitude with these coefficients, i.e.. 

This selection also allows for some control of the approximation in the Hilbert space 
norm, which we will defer to the next subsection in which we consider the more 
general case of arbitrary frames. 



3.1.3 General Frames 

Let now 4> form a frame for with frame bounds A and B, and let denote 
the canonical dual frame. We then consider the expansion of / in terms of this dual 
frame, i.e., 

/ = £(/,(^),)0,. (3.2) 
Notice that we could also consider 
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Let us explain, why the first form is of more interest to us in this chapter. By 
definition, we have ((/, G as well as ((/, ^i))iei G f'il). Since we only 

consider expansions of functions / belonging to a subset ^ of M', this can, at least, 
potentially improve the decay rate of the coefficients so that they belong to (,^{1) 
for some p <2. This is exactly what is understood by sparse approximation (also 
called compressible approximations in the context of inverse problems). We hence 
aim to analyze shearlets with respect to this behavior, i.e., the decay rate of shearlet 



coefficients. This then naturally leads to form p.2[ ). We remark that in case of a 
tight frame, there is no distinction necessary, since then = 0, for all / G /. 

As in the tight frame case, it is not possible to derive a usable, explicit form 
for the best N-ie,xm approximation. We therefore again crudely approximate the 
best N-iQxm approximation by choosing the //-term approximation provided by the 
indices Im associated with the A'^ largest coefficients (/, in magnitude with these 
coefficients, i.e.. 

But, surprisingly, even with this rather crude greedy selection procedure, we obtain 
very strong results for the approximation rate of shearlets as we will see in Sect. [5] 
The following result shows how the A^-term approximation error can be bounded 
by the tail of the square of the coefficients C;. The reader might want to compare 



this result with the error in case of an orthonormal basis stated in (3.1 ). 



Lemma 3.1. Let be a frame for with frame bounds A and B, and let 

be the canonical dual frame. Let In C I with # \In\ = N, and let f^ be the 
N-term approximation f^ = E/g/a, (/? ^i) 0;- Then 

1 



II/-// 



N 



< 



i:i(/,<^>.)r 



(3.3) 



Proof. Recall that the canonical dual frame satisfies the frame inequality with 
bounds and A ^ At first hand, it therefore might look as if the estimate (3.3) 



should follow directly from the frame inequality for the canonical dual. However, 



since the sum in (3.3) does not run over the entire index set / G /, but only I\In, 



this is not the case. So, to prove the lemma, we first consider 

||/-/^|| =sup{|(/-/^,g)| :gG ^,11^11 = 1} 



sup 



Y,{f,^d{kg) 

' iilN 

Using Cauchy-Schwarz' inequality, we then have that 



(3.4) 



^ L {fA){k8) < I l(/,0/)l' E \{kgt<A-' \\gf £ \{fM\ 

where we have used the upper frame inequality for the dual frame ((^,)/ in the second 



step. We can now continue (3.4) and arrive at 

Wf-f^f < sup\ ^\\gf £ \{fM' : ^ e ^, kll = 1 !> = £ \{fM'- 



iilN 
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□ 

Relating to the previous discussion about the decay of coefficients (/, ^,), let c* 
denote the non-increasing (in modulus) rearrangement of c = (c/)/e/ = ((/, 
e.g., c* denotes the nth largest coefficient of c in modulus. This rearrangement 
corresponds to a bijection n :N I that satisfies 

;r : N /, C;,;(„) = c* for all n G N. 

Strictly speaking, the rearrangement (and hence the mapping n) might not be 
unique; we will simply take c* to be one of these rearrangements. Since c G 
also c* G £^(N). Suppose further that \c*\ even decays as 

|c!| <n-(«+i)/2 for n^oo 

for some a > 0, where the notation h{n) < g{n) means that there exists a C > 
such that h{n) < Cgjn), i.e., h{n) = 0{g{n)). Clearly, we then have c* G £P{N) for 



p > By Lemma 3.1 the A^-term approximation error will therefore decay as 



1 



\\f-fNf<il,\c:\'<l,n-+',,N- 

^ n>N n>N 

where is the A'^-term approximation of / by keeping the largest coefficients, 
that is, 

A' 

fN=Y,^*n^7t{n)- (3-5) 

n=l 

The notation h{n) x g{n), also written h{n) = &{g{n)), used above means that h 
is bounded both above and below by g asymptotically as n — )■ oo, that is, h{n) = 
0{g{n)) andg{n) = 0{h{n)). 



3.2 A Notion of Optimality 

We now return to the setting of functions spaces = L?-{W^), where the subset ^ 
will be the class of cartoon-like images, that is, ^ = S'l{W^). We then aim for a 
benchmark, i.e., an optimality statement, for sparse approximation of functions in 
S'l{W^). For this, we will again only require that our representation system ^» is a 
dictionary, that is, we assume only that 4> = is a complete family of functions 

in L?'{W^) with / not necessarily being countable. Without loss of generality, we 
can assume that the elements 0,- are normalized, i.e., ||(|>;H^2 = 1 for all i G /. For 
/ G S'l(W^) we then consider expansions of the form 

/= Y^Ci^i, 

ieif 

where If C I is a countable selection from / that may depend on /. Relating to the 
previous subsection, the first elements of^f:= {^i}ieif could for instance be the 
A'^ terms from 4> selected for the best A'^-term approximation of /. 
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Since artificial cases shall be avoided, this selection procedure has the following 
natural restriction which is usually termed polynomial depth search: The nth term 
in 4>y is obtained by only searching through the first q{n) elements of the list ^f, 
where (^r is a polynomial. Moreover, the selection rule may adaptively depend on 
/, and the nth element may also be modified adaptively and depend on the first 
{n — l)th chosen elements. We shall denote any sequence of coefficients ci chosen 
according to these restrictions by c(/) = (c,)/. The role of the polynomial q is 
to limit how deep or how far down in the listed dictionary 4>y^ we are allowed to 
search for the next element ^, in the approximation. Without such a depth search 
limit, one could choose ^» to be a countable, dense subset of L^{W^) which would 
yield arbitrarily good sparse approximations, but also infeasible approximations in 
practise. 

Using information theoretic arguments, it was then shown in [[8 , 15 1, that almost 



no matter what selection procedure we use to find the coefficients c(/), we cannot 



have ||c(/) \\(p bounded for p < for J = 2, 3. 



Theorem 3.2 ( [[8} 15 1). Retaining the definitions and notations in this subsection 



and allowing only polynomial depth search, we obtain 

2{d-r 



max ||c(/)||^„ = +oo, forp< . 

feS'KW) d + l 

In case 4> is an orthonormal basis for L^(]R'^), the norm ||c(/)||£p is trivially 
bounded for p > 2 since we can take c(/) = (c/)/e/ = ((/, Although not 
explicitly stated, the proof can be straightforwardly extended from 3D to higher 

y extended. 
In fact, as 



dimensions as also the definition of cartoon-like images can be similar 
It is then intriguing to analyze the behavior of from Thm. 

J — > oo, we observe that ^^^^P ~^ 2. Thus, the decay of any c(/) for cartoon-like 
images becomes slower as d grows and approaches which - as we just mentioned 
- is actually the rate guaranteed for all f E L^(]R^). 

Thm. |3.2| is truly a statement about the optimal achievable sparsity level: No 
representation system - up to the restrictions described above - can deliver approx- 
imations for S'l{W^) with coefficients satisfying c(/) G ip for p < ^^qrp. This 
implies the following lower bound 

c(/):>n-^= ^ • ^ ' (3.6) 
In : d = 3. 

where c(/)* = {c{f)*)nen is a decreasing (in modulus) arrangement of the coeffi- 
cients c{f). 

One might ask how this relates to the approximation error of (best) A'^-term 
approximation discussed before. For simplicity, suppose for a moment that 4> is 
actually an orthonormal basis (or more generally a Riesz basis) for L?-{W^) with 



d = 2 and d = 3. Then - as discussed in Sect. 3.1.1 - the best A^^-term approximation 
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to f E ^^(M^) is obtained by keeping the largest coefficients. Using the error 
estimate p.l[) as well as p.6[), we obtain 



ll/-/^fe= £|c(/):|2> £n-^xiV-^, 

n>N n>N 

i.e., the best A'^-term approximation error ||/ — /a?||^2 behaves asymptotically as 



]\[ d-i or worse. If, more generally, 4> is a frame, and fy is chosen as in (3.5 1, we 

2 



can similarly conclude that the asymptotic lower bound for ||/ — /Af||^2 is A'^ d- \ ^ 

2_ 

that is, the optimally achievable rate is, at best, A'^ d-\ . Thus, this optimal rate can 
be used as a benchmark for measuring the sparse approximation ability of cartoon- 
like images of different representation systems. Let us phrase this formally. 

Definition 3.1. Let 4> = be a frame for L^{W') with J = 2 or J = 3. We say 

that 4> provides optimally sparse approximations of cartoon-like images if, for each 
/ G S'l{W'), the associated A^-term approximation (cf. p.5| )) by keeping the A'^ 
largest coefficients of c = c(/) = ((/, satisfies 

ll/-/^lli2<A^"^ asA^^oo, (3.7) 

and 

d+l 

K\<n 2(rf->) asw^oo, (3.8) 



where we ignore log-factors. 



d+l 



Note that, for frames 4>, the bound |c* | < n ^(rf-i) automatically implies that 



~ ^ whenever is chosen as in Eqn. p.5[ ). This follows from 
Lemma inl and the estimate 



<Vn-^^< x-d^idx<C-N-d^^, (3.9) 



where we have used that — + 1 = —-^j^- Hence, we are searching for a repre- 
sentation system 4> which forms a frame and delivers decay of c = ((/, 0;));£/ as 
(up to log-factors) 

d+l [ n"3/2 . ti = 2 

i4i<n-^ = r_, • , r (3.10) 




d = 3. 



as n — 7- oo for any cartoon-like image. 



3.3 Approximation by Fourier Series and Wavelets 

We will next study two examples of more traditional representation systems - the 
Fourier basis and wavelets - with respect to their ability to meet this benchmark. For 
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this, we choose the function / = Xb, where 5 is a ball contained in [0, 1]"^, again d = 
2 or J = 3, as a simple cartoon-like image in with L = 1, analyze the error 

11/ — /yv II for /a^ being the A'^-term approximation by the N largest coefficients and 
compare with the optimal decay rate stated in Definition |3.1[ It will however turn 
out that these systems are far from providing optimally sparse approximations of 
cartoon-like images, thus underlining the pressing need to introduce representation 
systems delivering this optimal rate; and we already now refer to Sect. |5] in which 
shearlets will be proven to satisfy this property. 

Since Fourier series and wavelet systems are orthonormal bases (or more gener- 
ally, Riesz bases) the best A^-term approximation is found by keeping the largest 
coefficients as discussed in Sect. |3.1.Tl 



3.3.1 Fourier Series 

The error of the best A'^-term Fourier series approximation of a typical cartoon- 
like image decays asymptotically as N^^^^. The following proposition shows this 
behavior in the case of a very simple cartoon-like image: The characteristic function 
on a ball. 

Proposition 3.3. Let J e N, and let 4> = {&'^^^^^)kezd- Suppose f = Xb, where B is 
a ball contained in [0, 1] . Then 

||/-/„||2,xA^^i/^ /or A^^oo, 

where is the best N-term approximation from 4>. 

Proof. We fix a new origin as the center of the ball B. Then / is a radial function 
f{x) = h{\\x\\2) for X E M.^. The Fourier transform of / is also a radial function and 
can expressed explicitly by Bessel functions of first kind p4l[T8| : 

f(P^d/2 Jd/2{2nr\\^\\2) 

where r is the radius of the ball B. Since the Bessel function Jd/ii^) decays like 
x^^/^ as jc — 7- oo, the Fourier transform of / decays hke |/(i^)| x ||(^ II2 '''^^^^^^ as 
11^ II2 ~^ °°- Letting If^ = {k E 7/ : \\k\\2 < N} and //^ be the partial Fourier sum 
with terms from 1^, we obtain 

ll/-//.fe=I|/»|'x /„ 11^112 ^'^^^d^ 



k0N 



N-K 



poo noo 

JN Jn 

The conclusion now follows from the cardinality of # |/a/| x A'^'^ as A^ — )■ 00. □ 
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3.3.2 Wavelets 

Since wavelets are designed to deliver sparse representations of singularities - see 
Chapter [|T| - we expect this system to outperform the Fourier approach. This will 
indeed be the case. However, the optimal rate will still by far be missed. The best A'^- 
term approximation of a typical cartoon-like image using a wavelet basis performs 
only slightly better than Fourier series with asymptotic behavior as A'^^ ^/ ("^^ i) . This 
is illustrated by the following result. 

Proposition 3.4. Letd = 2,3, and let ^ be a wavelet basis for {W^) orL^([0, 1]^). 
Suppose f = Xb, where B is a ball contained in [0, 1]"^^ Then 

Wf-fnWli-N-^^ forN ^oo, 
where is the best N-term approximation from 4>. 

Proof. Let us first consider wavelet approximation by the Haar tensor wavelet basis 
forL2([0,l]'^) of the form 

{<|)0,^:|^|<2^-l}u{v/i„...V/2d-i:j>7,|fc|<2^'-^-l}, 

where J eN,keN^, and gj^k = V'^'^giV ■ -k) for g G L^{W'). There are only a 
finite number of coefficients of the form (/, <^Q^k), hence we do not need to consider 
these for our asymptotic estimate. For simplicity, we take 7 = 0. At scale j > 
there exist 0(2^(''-i)) non-zero wavelet coefficients, since the surface area of dB is 
finite and the wavelet elements are of size 2^-' x ■ • ■ x l^K 

To illustrate the calculations leading to the sought approximation error rate, 
we will first consider the case where 5 is a cube in [0, 1]'^. For this, we first 
consider the non-zero coefficients associated with the face of the cube contain- 
ing the point {b,c,...,c). For scale j, let k be such that supp i/zj^ fl supp/ ^ 0, 

where '^f^{x) = h{x\)p{x2) ■ ■ ■ p{xd) and h and p are the Haar wavelet and scal- 
ing function, respectively. Assume that b is located in the first half of the interval 
[2^^^i,2^-'(^i + 1)] ; the other case can be handled similarly. Then 

|(/, )| = r l^'^l^dxiU t ^^'^^^ = (Z7-2-^'/ti)2-^('^-i)2^'^/2 ^2-^^^/^ 

where we have used that {b — l^^ki) will typically be of size 1 2^'. Note that for 
the chosen j and k above, we also have that (/, ^) = for all / = 2, . . . , 2"^ — 1 . 

There will be 2- [2c2^(^-i)] nonzero coefficients of size 2 ^ ' associated with 
the wavelet at scale j. The same conclusion holds for the other wavelets ^f\ I = 
2, . . . , 2"^ — 1. To summarize, at scale j there will be Cl^'^'^^^^ nonzero coefficients 
of size C2^j'^/^. On the first Jq scales, that is j = 0, 1, . . . jo, we therefore have 
EjLo^"''"^^'' ^ 2-'°('^~^) nonzero coefficients. The nth largest coefficient c* is of 

size n ^(d-i) since, for n = 2-' we have 

■d d_ 
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Therefore, 



n>N n>N " ^ 



Hence, for the best A^-term approximation fy of / using a wavelet basis, we obtain 
the asymptotic estimates 



1 



Let us now consider the situation that 5 is a ball. In fact, in this case we can do 
similar (but less transparent) calculations leading to the same asymptotic estimates 
as above. We will not repeat these calculations here, but simply remark that the 
upper asymptotic bound in | (/, y/j ^) | x l^^'^l^ can be seen by the following general 
argument: 

\{f.VjM)\ < II/IIl" WUIl^ < ||/||lHIv^'IIli2-^''/' <c2-^-^/2, 

which holds for each / = !,... ,2'^— 1. 

Finally, we can conclude from our calculations that choosing another wavelet 
basis will not improve the approximation rate. □ 

Remark 1. We end this subsection with a remark on linear approximations. For a 
linear wavelet approximation of / one would use 

2'' -I jo 

f ^ if, 0o,o> <^o,o + E E E (/, ¥j,k) ¥j,k 

1=1 j=0\k\<2J-l 

for some jo > 0. If restricting to linear approximations, the summation order is not 
allowed to be changed, and we therefore need to include all coefficients from the 
first Jo scales. At scale j > 0, there exist a total of 2-''' coefficients, which by our 
previous considerations can be bounded by C ■ 1^^^/^. Hence, we include 2^ times 
as many coefficients as in the non-linear approximation on each scale. This implies 
that the error rate of the linear A'^-term wavelet approximation is N^^^'', which is 
the same rate as obtained by Fourier approximations. 



3.3.3 Key Problem 

The key problem of the suboptimal behavior of Fourier series and wavelet bases 
is the fact that these systems are not generated by anisotropic elements. Let us 
illustrate this for 2D in the case of wavelets. Wavelet elements are isotropic due 
to the scaling matrix diag(2'', 2-'). However, already intuitively, approximating a 
curve with isotropic elements requires many more elements than if the analyzing 
elements would be anisotropic themselves, see Fig. [3] and |4] 

Considering wavelets with anisotropic scaling will not remedy the situation, 
since within one fixed scale one cannot control the direction of the (now anisotrop- 
ically shaped) elements. Thus, to capture a discontinuity curve as in Fig. |4[ one 
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Figure 3: Isotropic elements cap- Figure 4: Rotated, anisotropic el- 
turing a discontinuity curve. ements capturing a discontinuity 

curve. 

needs not only anisotropic elements, but also a location parameter to locate the el- 
ements on the curve and a rotation parameter to align the elongated elements in the 
direction of the curve. 

Let us finally remark why a parabolic scaling matrix diag (2-', 2-'/^) will be nat- 
ural to use as anisotropic scaling. Since the discontinuity curves of cartoon-like 
images are C^-smooth with bounded curvature, we may write the curve locally by 
a Taylor expansion. Let's assume it has the form (s,E{s)) with 

E{s)=E{s')+E\s')s + E"{t)s^ 

near s = s' for some \t\ E [s',s]. Clearly, the translation parameter will be used 
to position the anisotropic element near {s',E{s')), and the orientation parameter 
to align with {l,E'{s')s). If the length of the element is /, then, due to the term 
E"{t)s^, the most beneficial height would be l^. And, in fact, parabolic scaling 
yields precisely this relation, i.e., 

height ~ length . 

Hence, the main idea in the following will be to design a system which consists 
of anisotropically shaped elements together with a directional parameter to achieve 
the optimal approximation rate for cartoon-like images. 



4 Pyramid-Adapted Shearlet Systems 

After we have set our benchmark for directional representation systems in the sense 
of stating an optimality criteria for sparse approximations of the cartoon-like image 
class S'lCR'^), we next introduce classes of shearlet systems we claim behave op- 
timally. As already mentioned in the introduction of this chapter, optimally sparse 
approximations were proven for a class of band- limited as well as of compactly sup- 
ported shearlet frames. For the definition of cone-adapted discrete shearlets and, in 
particular, classes of band-limited as well as of compactly supported shearlet frames 
leading to optimally sparse approximations, we refer to Chapter [JJ. In this section, 
we present the definition of discrete shearlets in 3D, from which the mentioned 
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definitions in the 2D situation can also be directly concluded. As special cases, 
we then introduce particular classes of band-limited as well as of compactly sup- 
ported shearlet frames, which will be shown to provide optimally approximations 



of ff/(M ) and, with a slight modification which we will elaborate on in Sect. 5.2.4 
also for ^atO^^) with 1< a < /3 < 2. 



4.1 General Definition 

The first step in the definition of cone-adapted discrete 2D shearlets was a parti- 
tioning of 2D frequency domain into two pairs of high-frequency cones and one 
low-frequency rectangle. We mimic this step by partitioning 3D frequency domain 
into the three pairs of pyramids given by 



= {(^1,^2,^3) GM^: 


1^1 


>l,IW^l 


<1,IW^1 


<!}, 


= {(^1,^2,^3) GK^: 




>l,|^l/^2 


< 1, 1^3/^2 


<!}, 


= {(^1,^2,^3) GM': 


1^3 


>l,|^l/^3 


< 1, 1^2/^3 


<!}, 



and the centered cube 

= {(^1,^2,^3) GM3: 11(^1, ^2,^3)L<1}. 

This partition is illustrated in Fig. [5] which depicts the three pairs of pyramids and 




(a) Pyramid 5^ = U ^4 (b) Pyramid = ^2 U 6*^5 (c) Pyramids # = U 
and the ^1 axis. and the E,2 axis. and the E,2 axis. 

Figure 5: The partition of the frequency domain: The 'top' of the six pyramids. 

Fig. |6] depicting the centered cube surrounded by the three pairs of pyramids 
^, and 

The partitioning of frequency space into pyramids allows us to restrict the range 
of the shear parameters. Without such a partitioning as, e.g., in shearlet systems 
arising from the shearlet group, one must allow arbitrarily large shear parameters, 
which leads to a treatment biased towards one axis. The defined partition however 
enables restriction of the shear parameters to [— [2-^/^] , [2-^/^]], similar to the defi- 
nition of cone-adapted discrete shearlet systems. We would like to emphasize that 
this approach is key to provide an almost uniform treatment of different directions 
in a sense of a 'good' approximation to rotation. 
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Figure 6: The partition of the frequency domain: The centered cube The ar- 
rangement of the six pyramids is indicated by the 'diagonal' lines. See Fig.[5]for a 
sketch of the pyramids. 

Pyramid- adapted discrete shearlets are scaled according to the paraboloidal 
scaling matrices A2j,A2j or A2y, j G Z defined by 

/2j \ /2^/2 \ /2'/2 0\ 

A2J = 2^/2 , A2j=\ 2j , and A^j = V/^ , 
\0 2^/2/ \ 2^/2/ \ vj 

and directionality is encoded by the shear matrices S^, Sj^, or S^, k = (^1,^2) £ 
given by 

fl ki k2\ /I 0\ ^ 

Sk=\0 1 , 5^ = Ui 1 ^2 , and 5^ = 1 , 
\0 1 / \0 1/ \ki k2 Ij 

respectively. The reader should note that these definitions are (discrete) spe- 
cial cases of the general setup in [2]. The translation lattices will be defined 
through the following matrices: Mc — diag(ci,C2,C2), Mc = diag(c2,ci,C2), and 
Mc = diag(c2,C2,ci), where ci > and C2 > 0. 

We are now ready to introduce 3D shearlet systems, for which we will make 
use of the vector notation \k\ < Kfor k= (^1,^2) and ^ > to denote \ki \ <K and 

\k2\<K. 

Definition 4.1. For c = (ci,C2) G (M+)2, the pyramid-adapted discrete shearlet 
system SH{^, Xf/, iff, iff;c) generated by 0, yr.xj/^xjf e L^(M?) is defined by 

5//(0, V/, ¥;c) = 4)((^;ci) U^(v/;c) U»l'(v/;c) U^(VA;c), 

where 

4>((^;ci) = = -m) : m G ciZ^}, 
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^(VA;c) = I V/,-,,„ = VyriSkA^j ■ -m) : j > 0, \k\ < [2^/2] ,m G M.Z^} , 
*(VA;c) = {xffj,k,m = 2J¥{SkA2J ■ -m) : j > 0, \k\ < \Vl\m^ Mj}}, 

and 

❖(V/;c) = {l?r^-^,„, = V^fiS^A^i ■ -m) : j > 0, \k\ < \Vl\m^ Mj}}, 

where j G No and k eZ^. For the sake of brevity, we will sometimes also use the 
notation x^/x with A = {j,k,m). 

We now focus on two different special classes of pyramid- adapted discrete 
shearlets leading to the class of band-limited shearlets and the class of compactly 
supported shearlets for which optimality of their approximation properties with re- 
spect to cartoon-like images will be proven in Sect. |5| 

4.2 Band-Limited 3D Shearlets 

Let the shearlet generator \j/ G l2(]R^) be defined by 

V/(0 = V^i(^i)V^2(|)v^2(|), (4.1) 

where i/Aj and 1//2 satisfy the following assumptions: 

(i) V/i G C°=(M), suppv/i C [-4,-i] U [i,4], and 

£|v/i(2-^'^)|' = l for |^|>1,^GM. (4.2) 
j>o 

(ii) V/2 e C°°(M), supp v/2 C [-1, 1], and 

|V/2(^+/)|' = l for |^|<1,^GM. (4.3) 

i=-i 

Thus, in frequency domain, the band-limited function Xjr G l2(]R-^) is almost a tensor 
product of one wavelet with two 'bump' functions, thereby a canonical generaliza- 
tion of the classical band-limited 2D shearlets, see also Chapter [jT|. This implies 
the support in frequency domain to have a needle-like shape with the wavelet acting 
in radial direction ensuring high directional selectivity, see also Fig.|7j The deriva- 
tion from being a tensor product, i.e., the substitution of ^2 and (^3 by the quotients 
^2/^1 and (^3/(^1, respectively, in fact ensures a favorable behavior with respect to 
the shearing operator, and thus a tiling of frequency domain which leads to a tight 
frame for l2(m3). 

A first step towards this result is the following observation. 
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Figure 7: Support of two shearlet elements V/j.yt.m in the frequency domain. The 
two shearlet elements have the same scale parameter j = 2, but different shearing 
parameters k= {ki, kj) . 



Theorem 4.1 ( [TV]). Let \j/be a band-limited shearlet defined as in this subsection. 
Then the family of functions 

^(V/) = {xifj^k,m : J > 0, 1^1 < r2^'/2] ,m G 

forms a tight frame for V- {0^) := {/ G L^{M?) : supp/ C ^}. 
Proof. For each j > 0, equation ( |4.3[ ) implies that 

\U2j/^^+k)\^ = \, for|^|<l. 

k=- [2i/2] 



Hence, using equation ( |4.2[ ), we obtain 

j>0kuk2=-\2JP-] 

= £|v)ri(2-^-^i)|2| £ |VA2(2^-/2|^^^)|2 £ |v/2(2^-/2|^^^)|2 
= 1, 

for ^ = ((^i,(^2,<^3) € Using this equation together with the fact that iff is sup- 
ported inside [—4,4]^ proves the theorem. □ 



By Thm. [4.1 1 and a change of variables, we can construct shearlet frames for 
L^(<^), L^(^), and L^(^), respectively. Furthermore, wavelet theory provides 
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us with many choices of G L^(]R^) such that 4>(^; g) forms a frame for L?{^). 
Since ]R^ = ^U^U,^U^ as a disjoint union, we can express any function 
/ G L^(]R^) as / = P'^f + Pg^f + P^f + P^f, where each component corresponds 
to the orthogonal projection of / onto one of the three pairs of pyramids or the 
centered cube in the frequency space. We then expand each of these components in 
terms of the corresponding tight frame. Finally, our representation of / will then be 
the sum of these four expansions. We remark that the projection of / onto the four 
subspaces can lead to artificially slow decaying shearlet coefficients; this will, e.g., 
be the case if / is in the Schwartz class. This problem does in fact not occur in the 
construction of compactly supported shearlets. 



4.3 Compactly Supported 3D Shearlets 



It is easy to see that the general form ( |4.1| ) does never lead to a function which 



is compactly supported in spatial domain. Thus, we need to deviate this form by 
now taking indeed exact tensor products as our shearlet generators, which has the 
additional benefit of leading to fast algorithmic realizations. This however causes 
the problem that the shearlets do not behave as favorable with respect to the shearing 
operator as in the previous subsection, and the question arises whether they actually 
do lead to at least a frame for L^(]R^). The next results shows this to be true for an 
even much more general form of shearlet generators including compactly supported 
separable generators. The attentive reader will notice that this theorem even covers 
the class of band-limited shearlets introduced in Sect. 14.21 

Theorem 4.2 (^SJ). Let L^(M?) be functions such that 

< Ci min{l, l^irn -minll, l^jp^ -minjl, l^aP^, 

and 

\m)\<C2- min{l, l^il^} ■ min{l, l^ip^} ■ min{l, 1^2^ • min{l, l^gp^, 

for some constants C\,C2 > and d > ly > 6. Define ^f{x) = i//(jC2, jci,jc3) and 
= i/a(jc3,jc2,^i) for X = (jci,JC2,X3) G M?. Then there exists a constant cq > 
such that the shearlet system SH{^, i//, i/a;c) forms a frame for L^(]R^) for all 
c — (ci,C2) with C2 < C[ < Co provided that there exists a positive constant M > 
such that 

I mslA2j^)\' + \¥iSlA2j^)\' + mslA2j^^^^ (4.4) 

j>0kuk2eKj 

for a.e^e M^, where Kj := [- [2^/2] , [2^'/2] 

We next provide an example of a family of compactly supported shearlets sat- 



isfying the assumptions of Thm. 4.2 However, for applications, one is typically 
not only interested in whether a system forms a frame, but in the ratio of the as- 
sociated frame bounds. In this regard, these shearlets also admit a theoretically 
derived estimate for this ratio which is reasonably close to 1, i.e., to being tight. 
The numerically derived ratio is even significantly closer as expected. 
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Example 1. Let K,LeNhe such that L > 10 and ^ < < 3L - 2, and define a 
shearlet \ireL^(M?) by 



VA(^)=mi(4^i)^(^i)^(2^2)^(2^3), ^ = (^1,^2,^3) (4.5) 
where the function mo is the low pass filter satisfying 

|mo(^i)P = cos2^(;r^i)) I sin2«(;r^i), 

for (^1 G M, the function mi is the associated bandpass filter defined by 

|mi(^i)|2 = |mo(^l + l/2)|2, ^iGM, 
and the scaling function is given by 

oo 

<?(^i) = n"^o(2-'^i), ^leR. 



In [ 13 15 1 it is shown that and \j/ indeed are compactly supported. Moreover, 
we have the following result. 



Theorem 4.3 ( [15]). Suppose i// G L^(]R^) is defined as in (4.5). Then there exists 
a sampling constant cq > such that the shearlet system ^(i//; c) forms a frame for 
L^{^)for any translation matrix Mc with c = (ci,C2) G and C2 < ci < cq. 

sketch. Using upper and lower estimates of the absolute value of the trigonometric 



polynomial mg (cf. pp3|), one can show that \j/ satisfies the hypothesis of Thm.|42 
as well as 

I I |VA(5[A2,^)|2>M forall^G^, 

j>0kuk2eKj 

where M > is a constant, for some sufficiently small cq > 0. We note that this 
inequality is an analog to (4.4) for the pyramid Hence, by a result similar to 
but for the case, where we restrict to the pyramid L^(^), it then follows 



4.2 



Thm. 

that ^(v/;c) is a frame. □ 

To obtain a frame for all of L^(]R^) we simply set \j/{x) = \ir{x2,xi,X3) and 
i/a(x) = i//(x3,JC2,xi) as in Thm. 4.2 and choose 0(x) = ^{xi)^{x2)^{x3) as scal- 
ing function for x = {xi,X2jXt,) G M?. Then the corresponding shearlet system 
SH{^, I//, iff, i/A;c, a) forms a frame for L^(]R^). The proof basically follows from 
Daubechies' classical estimates for wavelet frames in [5, §3.3.2] and the fact that 
anisotropic and sheared windows obtained by applying the scaling matrix and 
the shear matrix to the effective supporj^of iff cover the pyramid ^ in the fre- 
quency domain. The same arguments can be applied to each of shearlet generators 
\f/, iff and Xjf as well as the scaling function ^ to show a covering of the entire 



^Loosely speaking, we say that / e L^(M'') has ejfective support on B if the ratio 

\\fXB\yi\\f\y is "close" to 1. 
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Table 1: Frame bound ratio for the shearlet frame from Example [ij with parameters 
K = 39,L=\9. 



Theoretical (5/A) 


Numerical (B/A) 


Translation constants (ci, C2) 


345.7 


13.42 


(0.9, 0.25) 


226.6 


13.17 


(0.9, 0.20) 


226.4 


13.16 


(0.9, 0.15) 


226.4 


13.16 


(0.9, 0.10) 



frequency domain and thereby the frame property of the pyramid-adapted shearlet 
system for L^(]R-^) . We refer to [15\ for the detailed proof. 

Theoretical and numerical estimates of frame bounds for a particular parameter 
choice are shown in Table [1} We see that the theoretical estimates are overly pes- 
simistic, since they are a factor 20 larger than the numerical estimated frame bound 
ratios. We mention that for 2D the estimated frame bound ratios are approximately 
1/10 of the ratios found in Table [T] 



4.4 Some Remarks on Construction Issues 

The compactly supported shearlets VO','^,™ from Example [T| are, in spatial domain, 
of size 2^-^/^ times 2~-^/^ times 2"-^ due to the scaling matrix A2./. This reveals 
that the shearlet elements will become 'plate-like' as j — )■ 00. For an illustra- 
tion, we refer to Fig. [8] Band-limited shearlets, on the other hand, do not have 
compactly support, but their effective support (the region where the energy of the 
function is concentrated) in spatial domain will likewise be of size 2^'/^ times 
2~j/^ times 2^ J owing to their smoothness in frequency domain. Contemplating 




2-.j/2 



Figure 8: Support of a shearlet V/),o,m from Example [Tj 



about the fact that intuitively such shearlet elements should provide sparse approxi- 
mations of surface singularities, one could also think of using the scaling matrix 
A2; = diag(2-', 2-^, 2-'/^) with similar changes for and A2; to derive 'needle- 
like' shearlet elements in space domain. These would intuitively behave favor- 
able with respect to the other type of anisotropic features occurring in 3D, that is 



curvilinear singularities. Surprisingly, we will show in Sect. 5.2 that for optimally 



sparse approximation plate-like shearlets, i.e., shearlets associated with scaling ma- 
trix A2; = diag(2-^,2-'/^,2-^/^), and similarly A2J andA2; are sufficient. 
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Let us also mention that, more generally, non-paraboloidal scaling matrices of 
the form Aj = diag (2^, 2"'^, 2"^j) for < ai , a2 < 1 can be considered. The param- 
eters ai and a2 allow precise control of the aspect ratio of the shearlet elements, 
ranging from very plate-like to very needle-like, according to the application at 
hand, i.e., choosing the shearlet-shape that is the best matches the geometric char- 
acteristics of the considered data. The case a, < 1 is covered by the setup of the 
multidimensional shearlet transform explained in Chapter [j2|. 

Let us finish this section with a general thought on the construction of band- 
limited (not separable) tight shearlet frames versus compactly supported (non-tight, 
but separable) shearlet frames. It seems that there is a trade-off between compact 
support of the shearlet generators, tightness of the associated frame, and separabil- 
ity of the shearlet generators. In fact, even in 2D, all known constructions of tight 
shearlet frames do not use separable generators, and these constructions can be 
shown to not be applicable to compactly supported generators. Presumably, tight- 
ness is difficult to obtain while allowing for compactly supported generators, but we 
can gain separability which leads to fast algorithmic realizations, see Chapter |j3|. 
If we though allow non-compactly supported generators, tightness is possible as 
shown in Sect. 4.2 but separability seems to be out of reach, which causes prob- 
lems for fast algorithmic realizations. 



5 Optimal Sparse Approximations 



In this section, we will show that shearlets - both band-limited as well as compactly 
supported as defined in Sect. |4]- indeed provide the optimal sparse approximation 
rate for cartoon-like images from Sect. 3.2 Thus, letting {^fx)l = {Wj,k,m) j,k,m 



denote the band-limited shearlet frame from Sect. 4.2 and the compactly supported 



shearlet frame from Sect. 4.3 in both 2D and 3D (see yj) and J G {2, 3}, we aim to 
prove that 

forall/G^^2^M^)^ 



where - as debated in Sect. 3.1 - /a? denotes the A^-term approximation using the 
largest coefficients as in (3.5). Hence, in 2D we aim for the rate A'^^^ an d in 
3D we aim for the rate A^^^ with ignoring log-factors. As mentioned in Sect. 3.2 



see ( 3 . 1 ), in order to prove these rate, it suffices to show that the nth largest shearlet 



coefficient c* decays as 



let I <n ^id-i) 




d = 2, 
d = 3. 



According to Dfn. 3.1 this will show that among all adaptive and non-adaptive 



representation systems shearlet frames behave optimal with respect to sparse ap- 
proximation of cartoon-like images. That one is able to obtain such an optimal 
approximation error rate might seem surprising, since the shearlet system as well 
as the approximation procedure will be non- adaptive. 
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To present the necessary hypotheses, illustrate the key ideas of the proofs, and 
debate the differences between the arguments for band-limited and compactly sup- 
ported shearlets, we first focus on the situation of 2D shearlets. We then discuss 
the 3D situation, with a sparsified proof, mainly discussing the essential differ- 
ences to the proof for 2D shearlets and highlighting the crucial nature of this case 
(cf. Sect. [13]). 



5.1 Optimal Sparse Approximations in 2D 



As discussed in the previous section, in the case d 
|c;|<n-3/2and||/-/; 



'^N 2 (up to log-factors). In Sect. 



2, we ai m for th e estimates 
5.1.1 we will first 



provide a heuristic analysis to argue that shearlet frames indeed can deliver these 
rates. In Sect. 5.1.2 and 5.1.3 we then discuss the required hypotheses and state the 
main optimality result. The subsequent subsections are then devoted to proving the 
main result. 



5.1.1 A Heuristic Analysis 

We start by giving a heuristic argument (inspired by a similar argument for curvelets 
in [4|) on why the error ||/ — /nW^i satisfies the asymptotic rate A'^^^. We emphasize 
that this heuristic argument applies to both the band-limited and also the compactly 
supported case. 

For simplicity we assume L = 1, and let / G be a 2D cartoon-like im- 

age. The main concern is to derive the estimate ( |5.4 ) for the shearlet coefficients 



(/, ^fj^k,m), where ^f denotes either i// or ^f. We consider only the case \(r= x^r, since 
the other case can be handled similarly. For compactly supported shearlet, we can 
think of our generators having the form y/(x) = ri{xi)^{x2), x = {xi,X2), where J] 
is a wavelet and a bump (or a scaling) function. It will become important, that the 
wavelet 'points' in the xi-axis direction, which corresponds to the 'short' direction 
of the shearlet. For band-limited generators, we can think of our generators having 
the form \f/{^) = fj {^2/ ^1)^(^2) for = {^1,^,2). We, moreover, restrict our anal- 
ysis to shearlets M^j,k,m since the frame elements y^j,k,m can be handled in a similar 
way. 

We now consider three cases of coefficients xi^j^^m)'- 

(a) Shearlets M^j,k,m whose support does not overlap with the boundary dB. 

(b) Shearlets V^j,/t,m whose support overlaps with dB and is nearly tangent. 

(c) Shearlets Yj,k^m whose support overlaps with dB, but not tangentially. 

It turns out that only coefficients from case (b) will be significant. Case (b) is, 
loosely speaking, the situation, where the wavelet rj crosses the discontinuity curve 
over the entire 'height' of the shearlet, see Fig. [9} 

Case (a). Since / is C^-smooth away from dB, the coefficients | (/, V^/,yt.m)| 
will be sufficiently small owing to the approximation property of the wavelet T] . 
The situation is sketched in Fig. |9} 
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dB 




Figure 9: Sketch of the three cases: (a) the support of V^j.yt,™ does not overlap 
with dB, (b) the support of VAy^ „, does overlap with dB and is nearly tangent, (c) 
the support of Yj,k,m does overlap with dB, but not tangentially. Note that only a 
section of the discontinuity curve dB is shown, and that for the case of band- limited 
shearlets only the effective support is shown. 



Case (b). At scale j > 0, there are about 0(2-'/^) coefficients, since the shearlet 
elements are of length 2^-'/^ (and 'thickness' 2^j) and the length of dB is finite. By 
Holder's inequality, we immediately obtain 



[f.¥j,k,m)\ < ||/||^~||V^,,M,|L, <Ci2-3^'/4||^||^, <C2.2- 



3;/4 



for some constants C\,C2 > 0. In other words, we have 0{V/^) coefficients 
bounded by C2 ■ 2^^^^'^. Assuming the case (a) and (c) coefficients are negligible, 
the nth largest coefficient c* is then bounded by 



<Cn 



-3/2 



which was what we aimed to show; compare to (3.8) in Dfn. 3.1 This in turn 
implies (cf. estimate p.9| )) that 

£ 141^ < £ <c- / x-^dx<C-N-^. 

n>N "-^f^' ■'^ 



n>N 



By Lemma |37T| as desired it follows that 

1 V 



II/-// 



n\\l^ 



< 



n>N 



where A denotes the lower frame bound of the shearlet frame. 

Case (c). Finally, when the shearlets are sheared away from the tangent position 
in case (b), they will again be small. This is due to the frequency support of / and 
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as well as to the directional vanishing moment conditions assumed in Setup [T] 
or|2} which will be formally introduced in the next subsection. 

Summarising our findings, we have argued, at least heuristically, that shearlet 
frames provide optimal sparse approximation of cartoon-like images as defined in 
Dfn.im 



5.1.2 Required Hypotheses 

After having build up some intuition on why the optimal sparse approximation rate 
is achievable using shearlets, we will now go into more details and discuss the 
hypotheses required for the main result. This will along the way already highlight 
some differences between the band-limited and compactly supported case. 




Figure 10: Shaded region: The 
effective part of supp xlfj^k,m in the 
frequency domain. 



Figure 11: Shaded region: The 
effective part of supp V^ji,™ in the 
spatial domain. Dashed lines: the 
direction of line integration I{t) . 



For this discussion, assume that / G L?'{M?) is piecewise C'^+^-smooth with a 
discontinuity on the line ^ : xi = SX2, 5 G M, so that the function / is well approx- 
imated by two 2D polynomials of degree L > 0, one polynomial on either side of 
Jff, and denote this piecewise polynomial q(xi,X2). We denote the restriction of q 
to lines xi = sx2 + t, ? G M, by Pt{x2) = q{sx2 + t,X2). Hence, pt is a ID polynomial 
along lines parallel to going through {xi,X2) = {t,0); these lines are marked by 
dashed lines in Fig.fTT] 

We now aim at estimating the absolute value of a shearlet coefficient (/, y/^ ^ m) 

by 

\{f.Wj,k,<n)\ < \ {(l^Wj,k,m)\ + \{{q-f).Wj,k,'n)\- (5-1) 

We first observe that |(/, Vj^A^.m)] will be small depending on the approximation 
quality of the (piecewise) polynomial q and the decay of i// in the spatial domain. 
Hence it suffices to focus on estimating | ^fj,k^m)\ ■ 

For this, let us consider the line integration along the direction {x\,X2) = (^,1) 
as follows: For t eM. fixed, define integration of q^l^j^k,m along the lines xi = sx2+t. 
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X2 G K, as 



^(0 = / Pt{x2)¥j,k,m{^X2+t,X2)dX2, 

Jr 



Observe that | V^j.yt,m)| = is equivalent to / = 0. For simplicity, let us now 
assume m = (0,0). Then 

I{t) = 2y / pt{x2)w{SkA2jisX2 + t,X2))dX2 



3 ■ f 
24^' / xi\{f{SkA2j{sX2 + t,X2))dX2 

3 ^ f 

24^ £q / xi\ir{A2jS,^,2Jn^si^,X2))dx2, 



and, by the Fourier slice theorem [ 12 1 (see also ( |5.13| )), it follows that 

\I(t)\=2iJ 



Note that 

d \^ 



X(^) '^^'^2^'^vW^^i'^))^''''^''^^i =0 for almost alU G 



=2 

if and only if 



(^) V(A-/5-^,/2^^,(^i,0)) = for almost all G M. 



Therefore, to ensure I{t) = for any ID polynomial pt of degree L > 0, we require 
the following condition: 

/ d \£ 

[-g^J ¥j,k,oi^h -s^i) = for almost all G M and £ = 0, . . . ,L. 

These are the so-called directional vanishing moments (cf. Q) in the direction 
(5, 1). We now consider the two cases, band-limited shearlets and compactly sup- 
ported shearlets, separately. 

If 1// is a band-limited shearlet generator, we automatically have 

(^)Vj,^,«.(^i,-^'^i)=0 for£ = 0,...,L if\s + ^\>2-jl\ (5.2) 

since suppy/ C ^, where ^ = eM? : \^2/^i \ < 1} as discussed in Chap- 
ter [[T|. Observe that the 'direction' of suppi//y yt,m is determined by the line 
y ■.xi = —1^X2. Hence, equation ( |5.2| ) imphes that, if the direction of supp Wj,k,m-, 
i.e., of y is not close to the direction of y in the sense that l^' + ^ | > 2^-^/^, then 

\{q,Wj,k,'n) \ =0- 
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However, if is a compactly supported shearlet generator, equation ( |5.2[ ) can 



never hold, since it requires that supp i/A C Therefore, for compactly supported 
generators, we will assume that {-^Yv^ ^ = 0, 1, has sufficient decay in to 

force I{t) and hence \ (^q, Yj^k,m)\ to be sufficiently small. It should be emphasized 
that the drawback that I{t) will only be 'small' for compactly supported shearlets 
(due to the lack of exact directional vanishing moments) will be compensated by 
the perfect localization property which still enables optimal sparsity. 

Thus, the developed conditions ensure that both terms on the right hand side of 



(5.1 1 can be effectively bounded. 



This discussion gives naturally rise to the following hypotheses for optimal 
sparse approximation. Let us start with the hypotheses for the band-limited case. 

Setup 1. The generators ^, i//, i// G L^(]R^) are band-limited and C°° in the frequency 
domain. Furthermore, the shearlet system i//, \ff;c) forms a frame for L^(]R^) 



(cf. the construction in Chapter [1] or Sect. 4.2). 



In contrast to this, the conditions for the compactly supported shearlets are as 
follows: 

Setup 2. The generators 0, i/A, i/ir g L^(]R^) are compactly supported, and the shear- 
let system i//, V/;c) forms a frame for L?'(M?). Furthermore, for all ^ = 
(<^i, (^2) £ I^^, the function \j/ satisfies 

(i) |v/(^)|<C-min{l,|^i|^}.min{l,|^i|-r}.niin{l,|^2rn,and 



(ii) 



'-xiri^)\<\hi^,)\{ 



where 5 > 6, Y> 3, h E L^(]R), and C a constant, and \fr satisfies analogous condi- 
tions with the obvious change of coordinates (cf. the construction in Sect. 4.3[ ). 



Conditions (i) and (ii) in Setupj2jare exactly the decay assumptions on (^)'v^, 
/ = 0, 1, discussed above that guarantees control of the size of I{t). 



5.1.3 Main Result 



We are now ready to present the main result, which states that under Setup [T] or 
Setup [2] shearlets provide optimally sparse approximations for cartoon-like images. 

Theorem 5.1 ( |[T0}[17|). Assume Setup\l}or^ Let L G N. For any v > and 

/i > 0, the shearlet frame yA, yir;c) pro vide s optimally sparse approximations 

of functions f G S'l{M?) in the sense ofDfn. 



3.1 



I.e., 



\\f-fN\\ii = 0{N-\\oM') 



as N °°, 



(5.3) 



and 



(logn 



,3/2 



where c = {{f,\j/x) : X e A,^ = V or \j/ = ¥} '^^d c* 
(in modulus) rearrangement ofc. 



as n ^ 00, (5.4) 
{c*)neN is ^ decreasing 
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5.1.4 Band-Limitedness versus Compactly Supportedness 



Before we delve into the proof of Thm. 5.1[ we first carefully discuss the main dif- 



ferences between band-hmited shearlets and compactly supported shearlets which 
requires adaptions of the proof. 

In the case of compactly supported shearlets, we can consider the two cases 
I supp }^^r]dB\ 7^ and | supp xlr^ndB] =0. In case the support of the shearlet 
intersects the discontinuity curve dB of the cartoon-like image /, we will estimate 
each shearlet coefficient (f.^i^x) individually using the decay assumptions on iff 
in Setup |2} and then apply a simple counting estimate to obtain the sought esti- 
mates ( |5.3[ ) and ( |5.4[ ). In the other case, in which the shearlet does not interact with 



the discontinuity, we are simply estimating the decay of shearlet coefficients of a 
function. The argument here is similar to the approximation of smooth func- 
tions using wavelet frames and rely on estimating coefficients at all scales using the 
frame property. 

In the case of band-limited shearlets, it is not allowed to consider two cases 
I supp l/ZjL n =0 and | supp i//^ H ^ separately, since all shearlet elements 
Xjfx intersect the boundary of the set B. In fact, one needs to first localize the cartoon- 
like image / by compactly supported smooth window functions associated with 
dyadic squares using a partition of unity. Letting /g denote such a localized version, 
we then estimate (/e,V^i) instead of directly estimating the shearlet coefficients 
(/, xiTx)- Moreover, in the case of band-limited shearlets, one needs to estimate the 
sparsity of the sequence of the shearlet coefficients rather than analyzing the decay 
of individual coefficients. 

In the next subsections we present the proof - first for band-limited, then for 
compactly supported shearlets - in the case L = 1, i.e., when the discontinuity curve 
in the model of cartoon-like images is smooth. Finally, the extension to L 7^ 1 will 
be discussed for both cases simultaneously. 

We will first, however, introduce some notation used in the proofs and prove 
a helpful lemma which will be used in both cases: band-limited and compactly 
supported shearlets. For a fixed j, we let be a collection of dyadic squares 
defined by 

^. = {e=[^,|7^]x[^,|t^]:/i,/2GZ}. 
We let A denote the set of all indices ( j, k, m) in the shearlet system and define 

Aj = {{iXm)eA: -\Vl^'\ <k< \Vl^^,meI?}. 

For e > 0, we define the set of 'relevant' indices on scale j as 

Aj{e) = {XeAj:\{f,xifx)\>£} 



and, on all scales, as 



A(e) = {AGA:|(/,i/A;,)|>e}. 



Lemma 5.2. Assume Setup^or^ Let f G Then the following assertions 

hold: 
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( i) For some constant C, we have 



(ii) If 

for j > 0, then 



#|A,(e)|=0 for ;>-log2(e-l)+C 



#\Aj{e)\<e~y\ 
#|A(e)|<e-2/3iog2(£-i; 



(5.5) 

(5.6) 
(5.7) 



which, in turn, implies (5.3) and {5.4). 



Proof, (i). Since i// G L^(]R^) for both the band-limited and compactly supported 
setup, we have that 



{f.¥x)\ 



f{x)2^ \ir{SkA2jX-m)dx 



< 2i 



\ \j/{SkA2jX-m)\dx 
r\U. (5.8) 
As a consequence, there is a scale je such that | (/, V^a) I < ^ for each j > j^. It 



therefore follows from (5.8 1 that 



#|A(£)|=0 for ;>-log2(e'i)+C. 



(ii). By assertion (i) and estimate ( |5.6[ ), we have that 

#|A(e)| <Ce"2/3 log2(e"^). 

From this, the value £ can be written as a function of the total number of coefficients 
n = #|A(£)|. We obtain 

e(n) < C n"-^/^(log2(n))'^/^ for sufficiently large n. 
This implies that 



and 



|c:|<Cn-3/2(log2(n))3/2 
£ 141^ < C A^-2(log2(A^))^ for sufficiently large > 0, 



where c* as usual denotes the nth largest shearlet coefficient in modulus. 



□ 
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5.1.5 Proof for Band-Limited Shearlets for L = 1 

Since we assume L = 1, we have that / G <f^(]R^) = f?^(]R^). As mentioned in 
the previous section, we will now measure the sparsity of the shearlet coefficients 
{(/,!//;(_): A G A}. For this, we will use the weak £P quasi norm \\-\\^^gp defined as 
follows. For a sequence s = {siji^j, we let, as usual, s*„ be the nth largest coefficient 
in s in modulus. We then define: 

1 

One can show [ 19 1 that this definition is equivalent to 

,p = (^sup{#\{i : \si\ > e}|e^ : £ > 0} 



We will only consider the case xjr = Xjf since the case yr = iff can be handled 
similarly. To analyze the decay properties of the shearlet coefficients ((/, 
at a given scale parameter j > 0, we smoothly localize the function / near dyadic 
squares. Fix the scale parameter j > 0. For a non-negative C°° function w with 
support in [0, 1]^, we then define a smooth partition of unity 

£ WQix) = l, xeR^, 

where, for each dyadic square Qe ^j, wq (x) = w{V/^x\ — 1\ , 2-^/^jC2 — h)- We will 
then examine the shearlet coefficients of the localized function /g := fwQ. With 
this smooth localization of the function /, we can now consider the two separate 
cases, I suppwg ndB\ =0 and | suppwg fl 7^ 0. Let 

J j I J j ^ 

where the union is disjoint and is the collection of those dyadic squares Q G =Sj 
such that the edge curve dB intersects the support of wq. Since each Q has side 
length 2^j/-^ and the edge curve dB has finite length, it follows that 

#|^°|<2^/2. (5.9) 
Similarly, since / is compactly supported in [0, 1]^, we see that 

#\^j\<2j. (5.10) 

The following theorems analyzes the sparsity of the shearlets coefficients for each 
dyadic square Qe ^j. 

Theorem 5.3 ( [10|). Let f G ^^(R^). For Q G with j > fixed, the sequence 
of shearlet coefficients {dx '■= (/g, V^a) • ^ ^ ^j} obeys 
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Theorem 5.4 ( 1 10 1). Let f e (^^{M.^). For Q e with j > fixed, the sequence 
of shearlet coefficients {d}^ := (/g, y/x) : A G Aj} obeys 



idx)xeAj 

As a consequence of these two theorems, we have the following result. 



Theorem 5.5 ( [10|). Suppose f e ^^(R^). Then, for j > 0, the sequence of the 



shearlet coefficients {cx := (/, 1//^) : A G Aj} obeys 



< 1. 



Proof. Using Thm. 5.3 and 5.4, by the p-triangle inequality for weak £^ spaces, 
p < 1 , we have 



< E l|(/e'V^A)||'f2/3 



= E \\{fQ,¥x)fJfi/3+ E ||(/e,V'A 



2/3 



Equations ( |5.9| ) and ( |5.10[ ) complete the proof. 



□ 



We can now prove Thm. 5.1 for the band- limited setup. 



Thm. \5.1\ for Setup\l\ From Thm. |5.5[ we have that 

#\Aj{£) \ <c£"2/^ 

for some constant C > 0, which, by Lemma 5.2[ completes the proof. 



□ 



5.1.6 Proof for Compactly Supported Shearlets for L = 1 



To derive the sought estimates (5.3 1 and (5.4) for dimension J = 2, we will study 
two separate cases: Those shearlet elements yrx which do not interact with the 
discontinuity curve, and those elements which do. 

Case iThe compact support of the shearlet yrx does not intersect the boundary of 
the set B, i.e., |supp y/x^dBl =0. 

Case 2.The compact support of the shearlet y/^ does intersect the boundary of the 
set B, i.e., |supp i//^ n dB\ ^ 0. 

For Case 1 we will not be concerned with decay estimates of single coefficients 
(/, '^x), but with the decay of sums of coefficients over several scales and all shears 
and translations. The frame property of the shearlet system, the C^-smoothness of 
/, and a crude counting argument of the cardinal of the essential indices A will 
be enough to provide the needed approximation rate. The proof of this is similar 
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to estimates of the decay of wavelet coefficients for smooth functions. In fact, 
shearlet and wavelet frames gives the same approximation decay rates in this case. 
Due to space limitation of this exposition, we will not go into the details of this 
estimate, but rather focus on the main part of the proof. Case 2. 

For Case 2 we need to estimate each coefficient (/, x^f^ ) individually and, in par- 
ticular, how I (/, \ifx) \ decays with scale j and shearing k. Without loss of generality 
we can assume that f = fo + XbA with /o = 0. We let then M denote the area of 
integration in (/, x^/x), that is, 

M = supp v/;l ^B. 



Further, let be an affine hyperplane (in other and simpler words, a line in M?) 
that intersects M and thereby divides M into two sets Mf and M/, see the sketch in 



Fig. 12 We thereby have that 



(5.11) 



The hyperplane will be chosen in such way that the area of is sufficiently small. 
In particular, area(Mf) should be small enough so that the following estimate 



\{XM,f,¥x)\ < II V^aIIl- area (M,) <m23^'/4 area (M/ 



(5.12) 



do not violate (5.4). If the hyperplane ^ is positioned as indicated in Fig. 12 it can 



indeed be shown by crudely estimating area(Mf) that (5.12) does not violate esti 



mate (5.4). We call estimates of this form, where we have restricted the integration 
to a small part Mf of M, truncated estimates. Hence, in the following we assume 
that ( [5JT] ) reduces to (/, ^fx) = (Xm,/, Wx)- 




New origin 



suppv^A 



Figure 12: Sketch of supp x^fx^Mi, Mt, and ^ . The lines of integrations are shown. 



For the term {XMif, Wx) we will have to integrate over a possibly much large 
part Ml of M. To handle this, we will use that x^fx only interacts with the discontinu- 
ity of XmJ along a line inside M. This part of the estimate is called the linearized 
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estimate, since the discontinuity curve in {XMif, Wl) has been reduced to a line. 
Ill (Zm,/, Wx) we are, of course, integrating over two variables, and we will as the 
inner integration always choose to integrate along lines parallel to the 'singularity' 



line see Fig. 12 The important point here is that along these lines, the function 
/ is C^-smooth without discontinuities on the entire interval of integration. This is 
exactly the reason for removing the A// -part from M. Using the Fourier slice theo- 
rem we will then turn the line integrations along =Sf in the spatial domain into line 
integrations in the frequency domain. The argumentation is as follows: Consider 
g : — )■ C compactly supported and continuous, and let p : M — )• C be a projection 
of g onto, say, the X2 axis, i.e., p{xi) = J^g{xi,X2)dx2. This immediately implies 
that p{^i) = g{^i,0) which is a simplified version of the Fourier slice theorem. By 
an inverse Fourier transform, we then have 

/ g{xux2)dx2 = p{xi)= [ |(<^i,0)e2^">^id^i, (5.13) 

JR JR 



and hence 



/ \gixuX2)\dX2 = I |g(^i,0)|d^i. (5.14) 

JR JR 



The left-hand side of (5. 14 1 corresponds to line integrations of g along vertical lines 
x\ = constant. By applying shearing to the coordinates x E M?, we can transform 



^ into a line of the form {a: G : = constant}, whereby we can apply (5.14) 
directly. 

We will make this idea more concrete in the proof of the following key estimate 
for linearized terms of the form {XmJ, Wx ) ■ Since we assume the truncated estimate 
as negligible, this will in fact allow us to estimate (/, ^fx). 

Theorem 5.6. Let Xjr G L^(]R^) be compactly supported, and assume that i/A satisfies 
the conditions in Setup |2] Further, let A be such that supp ^ifiHdB 0. Suppose 
that f E ^(M^) and that dB is linear on the support of l//^ in the sense 

supp Xj/xndBc^ 

for some ajfine hyperplane ofM?. Then, 

(i) if ^ has normal vector (— 1,^') with \s\ < 3, 

\{f,Wx)\<- — ^, 

\k + Vl^s\ 

(ii) if ^ has normal vector (— 1,^') with \s\ > 3/2, 

\{f.¥x)\<2-'^^\ 
(Hi) if ^ has normal vector (0, s) with s E"^, then 

\{f.^fx)\<2-''^"- 
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Proof. Fix A, and let / G ^(M^). We can without loss of generality assume that / 
is only nonzero on B. 

Cases (i) and (it). We first consider the cases (i) and (ii). In these cases, the 
hyperplane can be written as 

^= {xgM^: (x-jco,(-1,^)) = 0} 

for some G M^. We shear the hyperplane by S-s for 5 G M and obtain 

S-s^ = {x : {SsX~XQ,{-\,s)) =0] 

= {xeM^:{x-S-sXo,{Ssf{-l,s)) = Q] 

= {jcGM^ : (jc-5_,;co,(-l,0)) = 0} 

— [^x = {xi,X2) G : xi = xi } , where x = S^xq, 

which is a line parallel to the A:2-axis. Here the power of shearlets comes into play, 
since it will allow us to only consider line singularities parallel to the X2-axis. Of 
course, this requires that we also modify the shear parameter of the shearlet, that is, 
we will consider the right hand side of 

with the new shear parameter k = k + V^^s. The integrand in {f{Ss-), Wj km) 
the singularity plane exactly located on the line xi =xi, i.e., on Ss-^. 

To simplify the expression for the integration bounds, we will fix a new origin 
on S-s-^, that is, on xi = f i ; the X2 coordinate of the new origin will be fixed in the 
next paragraph. Since / is only nonzero of B, the function / will be equal to zero 
on one side of S-s-^, say, xi < xi. It therefore suffices to estimate 

for /o G (M?) and Q. = IR+ x M. Let us assume that k <0. The other case can be 
handled similarly. 

Since i// is compactly supported, there exists some c > such that supp i// C 
[— c,c]^. By a rescaling argument, we can assume c = 1. Let 

^^.^ := |jc G : \xi +2-j/^kx2\ < 2-j, \x2\ < 2-^'^^ , (5.15) 

With this notation we have supp ^fj^kfi C We say that the shearlet normal 

direction of the shearlet box o is (1,0), thus the shearlet normal of a sheared 
element Wi,k,m associated with ^^ ^^ is (1,^/2-'/^). Now, we fix our origin so that, 
relative to this new origin, it holds that 

Then one face of ^ intersects the origin. 
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Next, observe that the parallelogram ^j^^ has sides X2 — ±2 ■'Z^, 

2^X1 + V/^kx2 = 0, and 
2'xi + Vl^kx2 = 2. 

As it is only a matter of scaling, we replace the right hand side of the last equation 
with 1 for simplicity. Solving the two last equalities for a:2 gives the following lines: 



Li : X2 
L2 : X2 

We shows that 



2i/2 

^^xi, and 
k 



< 



Ki rL 



JL2 



fQ{Ssx)^if■l^,^^{x)^2^l 



(5.16) 



where the upper integration bound for xi is ^1 = 2^^ — l^^k; this follows from solv- 
ing L2 for x\ and using that \x2\ < 2^-'/^. We remark that the inner integration over 
X2 is along lines parallel to the singularity line = {0} x M; as mentioned, this 
allows us to better handle the singularity and will be used several times throughout 
this section. 

We consider the one-dimensional Taylor expansion for /o(5'.s-) at each point 
X = (;ci,X2) G L2 in the ^2 -direction: 



fo{SsX) = a{xi)+b{xi) |^^2 + Xlj +c{xi,X2) \ X2 + ^— Xi 

where a{xi),b{xi) and c{xi,X2) are all bounded in absolute value by C(l -|- \s\)^. 
Using this Taylor expansion in ( 5.16^ yields 



^1 3 

2_^Ii{xi)dxi 

1=1 



(5.17) 



where 



Hxi) 



-2-j'Vk 







{X2+K2)\irjj^jx)dx2 

ix2)^ ¥ j;k,m(^^^ ^2 -K2)dX2 



(5.18) 
(5.19) 

(5.20) 



and 



2i/2 

K2 = Xl . 
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We next estimate each integral h - 13 separately. 

Integral h. We first estimate hixi). The Fourier slice theorem, see also (5.13 1, 
yields directly that 

^1(^1)= / ^jk.ri^)^2 = /^l^A^„(^l,0)e2--'^>d^i 

By the assumptions from Setup|2]we have, for all t, = ((^1, (^2, "^3) £ 



for some h G O (M). Hence, we can continue our estimate of I\ by 

Jr 

and further, by a change of variables, 

/i(^i)< / 2J/'m)\{i+\k\)-'d^i<2^/\i+\k\) 

Jr 

since /jeLi(]R). 

Integral h- We start estimating hixi) by 



-r 



(5.21) 



h{Xi) < 



+ 1^2! 



5i+52. 



Applying the Fourier slice theorem again and then utilizing the decay assumptions 
on x^f yields 



Si 



-^2V^^-,^,,„Wd^2 



< 



d^2 



< / 2-^-/22-3,/4|;^(2-^-^^)|(i^|^|)-rd^^ < 2-7/4(1 + |^|)-y 
Jr. 

Since \x\ \ < —k\/2^, we have K2 < 2^ '/^. The following estimate of 52 then follows 
directly from the estimate of /i : 

52<|if2|2^'/^l + |^|)-^<2-^-/4(i + |^|)-r 

From the two last estimate, we conclude that hixi) < 2-^1^ (1 + \k\Y^ . 
Integral I-^. Finally, we estimate h{x\) by 

I (^2) ||VA^.^,„||L"d;C2 



/3(^l) < 

< 23.//4 



{X2)^dx2 



<2-'j/'\k\-'. 



(5.22) 



38 of 50 



G. Kutyniok, J. Lemvig, W.-Q Lim Shearlets and Optimally Sparse Approximations 



We see that I2 decays faster than /i, hence we can leave I2 out of our analysis. 
Applying ( |5?2T] ) and to ( [5T7| ), we obtain 



2-3 j/4 2-^^/^^ 
+ 



(l + |fc|)r-i |fc|2 



(5.23) 



Suppose that s <3. Then (5.23) reduces to 



2-3,//4 2-7-//4 
^''^^■'"''"^1- (1 + 1^1)7-1+1^ 

2-3;/4 
~(TT^' 



since 7 > 4. This proves (i). 

On the other hand, if 5 > 3/2, then 



\f ■iWj,k,in)\ ^2 . 



To see this, note that 



< 



2^4J 



and 



(l + |)t + 52i/2|)3 {2-j/^ + \k/2-j/^ + s\)^ - \k/2j/^ + s\^ 
\k/2j/^+s\ > \s\-\k/2j/^\ > \/2-2-j/^> 1/4 



for sufficiently large j > 0, since \k\ < 



2j/2 < 2^/2 + 1, and (ii) IS proven. 



Case (Hi). Finally, we need to consider the case (iii), in which the nor- 
mal vector of the hyperplane =Sf is of the form {0,s) for G M. For this, let 
Cl = [x gM? : X2>0}. As in the first part of the proof, it suffices to consider 
coefficients of the form {Xdfo^ ¥j,k,m), where supp i/z^^yt.™ C ^j.k— (2^^,0) = ^j^k 
with respect to some new origin. As before, the boundary of ^j^^ intersects the 
origin. By the assumptions in Setup [2} we have that 



Va(0,<^2)=0 for £ = 0,1, 



which implies that 



/ x\ V/(x)djci = for all ^2 G M and £ = 0, 1. 



Therefore, we have 



/ x{\j/{Skx)dxi = for all ^2 G M,A: e M, and £ = 0, 1, (5.24) 

JR 
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since a shearing operation preserves vanishing moments along the xi axis. Now, 
we employ Taylor expansion of /o in the ;ci -direction (that is, again along the sin- 
gularity line dCl). By (5.24) everything but the last term in the Taylor expansion 
disappears, and we obtain 

< 23^'/4 2-i/2 2-3i = 2"ii^'/4 



which proves claim (iii). 



□ 



We are now ready show the estimates (5.6) and (5.7), which by Lem. 5.2 ii) 



completes the proof of Thm. 5.1 



For j > 0, fix Q E , where C =Sj is the collection of dyadic squares that 
intersects Jtf. We then have the following counting estimate: 



# M 



j,k,Q 



<\k + 2j/^s\ + \ 



(5.25) 



for each 1^1 < 



, where 



^j,k,Q •■={mEl}:\ supp \\fj^^_^ n ^ n 2| ^ 0} 

To see this claim, note that for a fixed j and k we need to count the number 
of translates m G for which the support of '^j^k.m intersects the discontinuity 
line Ji£ :x\= SX2 + b,b eM., inside Q. Without loss of generality, we can assume 



that Q 



0,2-^V2 



b = Q, and supp ^fj,kfi C C ■ l^j.k, where B^j^k is defined as 



in (5.15). The shearlet ^fj^k^m will therefore be concentrated around the line 5^,^ : 
xi = —1^X2 + 2^-'mi + 2^-'/^m2, see also Fig. |Tlj We will count the number of 
m = (mi,m2) G for which these two lines intersect inside Q since this number, 
up to multiplication with a constant independent of the scale j, will be equal to 

First note that since the size of 2 is 2 '/^ x 2 only a finite number of m.2 
translates can make 5,^ fl =Sf fl 2 7^ whenever mi G Z is fixed. For a fixed m2 G Z, 
we then estimate the number of relevant m\ translates. Equating the x\ coordinates 
in =Sf and S^m yields 



2i/2 



+ 5 )x2 = 2"^mi+2"-''/^. 



m2. 



Without loss of generality, we take m2 = which then leads to 

T^\mi\ <2-j/^ k + 2j/^s \X2\ <2-j k + V^^s 



hence \mi \ < 



k + V/^ 



. This completes the proof of the claim. 
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For £ > 0, we will consider the shearlet coefficients larger than £ in absolute 
value. Thus, we define: 

Mj,k,Qi^) = {me Mj^k,Q ■■ I (/, ¥j,k,m)\ > e} , 

where Q E Since the discontinuity line =Sf has finite length in [0, 1]^, we have 
the estimate # I I < 2-'/^. Assume has normal vector ( — Iji') with \s\ < 3. Then, 
by Thm. 5.61), | (/, ^fj^k.m) I > £ implies that 



\k + Vlh\<e-'l^2-il\ 
By Lem.[5^i) and the estimates ( |5.25[ ) and ( |5.26[ ), we have that 

|log2( 



(5.26) 



1loe,(£-i)+C 



#\A{e)\ < 



III 

,/=0 ge^^ {fc:|fc|<e-i/32-i/4} 



|log2(e-M+C 



< 



< 



I I I (1^1 +i; 

,/=0 ge^^ {fc:|fc|<e-i/32-i/4} 



ilog2(e-M+C 



£ #1^0 1 (£-2/32-^72) 
7=0 



^log2(e-i)+C 



< £-2/3 £ l<e-2/3log2(£-l), 



where, as usual, k = k + sV^'^. By Lem. 5.2 n), this leads to the sought estimates. 

On the other hand, if ^ has normal vector (0, 1) or (— 1,^) with 1^1 > 3, then 
\{f^Wx)\ > £ implies that 

j<^log2(£-i), 

which follows by assertions (ii) and (iii) in Thm. 5.6 Hence, we have 

^log2(£-') 

#|A(e)|< ELI #|^;>,e(e)|- 

7=0 k Q^^O 

Note that # \Mj^k,Q \ < ^J/^, since #\{mel?:\ supp i//;^ n 2| ^ 0} | < for each 
Q E ^j, and that the number of shear parameters k for each scale parameter 7 > 
is bounded by C2^/^. Therefore, 



5log2(e" 



§log2(£-'^ 



#|A(£)|< £ vl^vl^Vl^= £ 23^'/2<2^-i-iog2(e-') <e-2/3_ 
,/=o y=o 

This implies our sought estimate ( |5.6[ ) which, together with the estimate for \s\ < 3, 
completes the proof of Thm. 5.1 for L = 1 under Setup |2j 
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5.1.7 The Case L ^ 1 

We now turn to the extended class of cartoon-lime images <f^(]R^) with L 7^ 1, i.e., 
in which the singularity curve is only required to be piecewise C^. We say that 
;? G is a corner point if dB is not smooth in p. The main focus here will be 
to investigate shearlets that interact with one of the L corner points. We will argue 



that Thm. 5.1 also holds in this extended setting. The rest of the proof, that is, for 
shearlets not interacting with corner points, is of course identical to that presented 
in Sect. 15X5] and [5T6I 

In the compactly supported case one can simply count the number of shearlets 



interacting with a corner point at a given scale. Using Lem. 5.2 1), one then arrives 
at the sought estimate. On the other hand, for the band-limited case one needs 
to measure the sparsity of the shearlet coefficients for / localized to each dyadic 
square. We present the details in the remainder of this section. 



Band-limited Shearlets In this case, it is sufficient to consider a dyadic square 
Q E with J > such that Q contains a singular point of edge curve. Especially, 

we may assume that j is sufficiently large so that the dyadic square Q E contains 
a single corner point of dB. The following theorem analyzes the sparsity of the 
shearlet coefficients for such a dyadic square Q E 

Theorem 5.7. Let f E <^l(E?-) and Q E with j > be a dyadic square con- 
taining a singular point of the edge curve. The sequence of shearlet coefficients 
{dx — (/e, V^a) : ^ e Aj} obeys 



[d, 



<c. 



The proof of Thm. 5.7 is based on a proof of an analog result for curvelets 
Q. Although the proof in [4] considers only curvelet coefficients, essentially the 
same arguments, with modifications to the shearlet setting, can be applied to show 
Thm.|l71 

Finally, we note that the number of dyadic squares Q E containing a singular 
point of dB is bounded by a constant not depending on j; one could, e.g., take 



L as this constant. Therefore, applying Thm. |5.7| and repeating the arguments in 
Sect. |5 . 1 .5] completes the proof of Thm. |5.1| for L 7^ 1 for Setup [Tj 



Compactly Supported Shearlets In this case, it is sufficient to consider the fol- 
lowing two cases. 

Case /The shearlet i//;^ intersects a corner point, in which two curves BBq and 



dBi, say, meet (see Fig.[T3|). 
Case 2'J\\Q shearlet ^fx intersects two edge curves dBQ and (95 1, say, simultane- 



ously, but it does not intersect a comer point (see Fig. 14). 



We aim to show that #|A(£)| < e 3 in both cases. By Lem. 
sufficient. 



5.2 



this will be 
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Figure 13: A shearlet intersect- 
ing a corner point, in which two edge 
curves dBo and dBi meet. =Sfo and 
are tangents to the edge curves 
dBo and dBi in this comer point. 




Figure 14: A shearlet Xj/^ intersect- 
ing two edge curves BBq and dBi 
which are part of the boundary of sets 
Bq and Bi. and are tangents 
to the edge curves BBq and dBi in 
points contained in the support of xifx . 



Case 1. Since there exist only finitely many corner points with total number 
not depending on scale j > and the number of shearlets Xj^x intersecting each of 
corner points is bounded by C2-'/^, we have 

|log2(£-') 

#|A(e)|< £ 2^-/2<e-3. 



Case 2. As illustrated in Fig. [14} we can write the function / as 

foXBo + fiXBi = (/o - /i )Zso + /i in 2, 

where /o,/i G C^([0, 1]^) and Bq,B\ are two disjoint subsets of [0, 1]^. As we indi- 
cated before, the rate for optimal sparse approximation is achieved for the smooth 
function /i. Thus, it is sufficient to consider / := ^oZso with go = fo — fi ^ 
C^([0, 1]^). By a truncated estimate, we can replace two boundary curves BBq 
and dBi by hyperplanes of the form 

^■ = {xgM^: (x-jco,(-1,5/)) = 0} for/ = 0,1. 

In the sequel, we assume max,=o.i ^ 3 and mention that the other cases can be 
handled similarly. Next define 

^),k,Q ={meZ^:\ supp Xi/j^k^^ nX^Ql^O} for / = 0, 1 , 

for each Q E =Sj , where denotes the dyadic squares containing the two distinct 
boundary curves. By an estimate similar to ( |5.25[ ), we obtain 



# 



M 



nM 



<mm(\k + 2j/^s,\ + l). 

1=0 A 



Applying Thm. 5.6 1) to each of the hyperplanes and Sfi, we also have 



\{f,¥j,k,m) \ <C-max 



3 ■ 
2-4J 



o,n\2j/^Si + k\^ 



(5.27) 



(5.28) 
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Let ki = k + y/^Si for z = 0, 1 . Without loss of generality, we may assume that 
h<h - Then, ^TT\ and ( [5^ imply that 



# 



and 



3 . 

2--AJ 

\{f,Wj,k,m) \ < Tj^- 



(5.29) 



(5.30) 



Using ( |5.29| ) and ( |5.30[ ), we now estimate # \A (e) | as follows: 



|log2(e-i)+C 

#|A(e)| < I I I(l + l^ol) 



^log2(e-')+C 



< 



E 



# 



(£.-2/32-7/2) < £,-2/3, 



Note that | < C since the number of 2 G =Sy containing two distinct boundary 
curves BBq and dBi is bounded by a constant independent of 7. The result is proved. 



5.2 Optimal Sparse Approximations in 3D 

When passing from 2D to 3D, the complexity of anisotropic structures changes 
significantly. In particular, as opposed to the two dimensional setting, geometric 
structures of discontinuities for piecewise smooth 3D functions consist of two mor- 
phologically different types of structure, namely surfaces and curves. Moreover, as 



we saw in Sect. 5.1 , the analysis of sparse approximations in 2D heavily depends 
on reducing the analysis to affine subspaces of M^. Clearly, these subspaces always 
have dimension one in 2D. In dimension three, however, we have subspaces of di- 
mension one and two, and therefore the analysis needs to performed on subspaces 
of the 'correct' dimension. 

This issue manifests itself when performing the analysis for band-limited shear- 
lets, since one needs to replace the Radon transform used in 2D with a so-called 
X-ray transform. For compactly supported shearlets, one needs to perform the anal- 
ysis on carefully chosen hyperplanes of dimension two. This will allow for using 
estimates from the two dimensional setting in a slice by slice manner. 

As in the two dimensional setting, analyzing the decay of individual shearlet 
coefficients (/, i/a^) can be used to show optimal sparsity for compactly supported 
shearlets while the sparsity of the sequence of shearlet coefficients with respect to 
the weak (.p quasi norm should be analyzed for band-limited shearlets. 



5.2.1 A Heuristic Analysis 



As in the heuristic analysis for the 2D situation debated in Sect. 5.1.1 we can again 



spht the proof into similar three cases as shown in Fig. 15 
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(a) Sketch of shearlets whose (b) Sketch of shearlets whose (c) Sketch of shearlets whose 
support does not intersect the support overlaps with dB and support overlaps with dB in a 
surface dB. is nearly tangent. non-tangentially way. 

Figure 15: The three types of shearlets \l^j,k,m and boundary dB interactions con- 
sidered in the heuristic 3D analysis. Note that only a section of dB is shown. 



Only case (b) differs significantly from the 2D setting, so we restrict out atten- 
tion to that case. 

For case (b) there are at most 0(2^) coefficients at scale j > 0, since the plate- 
like elements are of size 2^-^/^ times 2^-'/^ (and 'thickness' 2^j). By Holder's 
inequality, we see that 

\{f.¥jA'n)\<\\f\\L-\\¥j,k,>n\\o <Ci2-^'||v/||li <C2-2-j 

for some constants Ci.Cj > 0. Hence, we have 0(2^) coefficients bounded by 
C2-2-J. 

Assuming the coefficients in case (a) and (c) to be negligible, the nth largest 
shearlet coefficient c* is therefore bounded by 

\c*„\ <C-n^\ 

which in turn implies 

£ < £ C-n"2 <C- / x-^dx<C-N-K 

n>N n>N ''^ 



Hence, we meet the optimal rates p.7| ) and ( |3.8[ ) from Dfn. 3.1 This, at least 



heuristically, shows that shearlets provide optimally sparse approximations of 3D 
cartoon-like images. 

5.2.2 Main Result 

The hypotheses needed for the band-limited case, stated in Setup |3j are a straight- 
forward generalization of Setup [T]in the two-dimensional setting. 

Setup 3. The generators ^, i//, ij/, ^ G L^(]R-^) are band- limited and C°° in the fre- 
quency domain. Furthermore, the shearlet system SH{^, i//, ij/, \(^',c) forms a frame 



for L^(]R^) (cf. the construction in Sect. 4.2). 



For the compactly supported generators we will also use hypotheses in the spirit 
of Setup[2[ but with slightly stronger and more sophisticated assumption on vanish- 
ing moment property of the generators i.e., 5 > 8 and 7 > 4. 
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Setup 4. The generators ^, i//, i//, G L^(]R^) are compactly supported, and the 
shearlet system SH{^, i//, ij/, \{^;c) forms a frame forL^(M^). Furthermore, the func- 
tion yr satisfies, for all E, = {^1,^2, ^3) G M?, 

(i) |v/(OI<C-min{l,|^i|^}min{l,|^i|-r}niin{l,|^2rnmin{l,|^3rn.and 



(ii) 



_d_ 



1 -t- ^ 



1^1 



for i = 2, 3, where 5 > S, y > 4, h E O (M), and C a constant, and xfr and xfr satisfy 
analogous conditions with the obvious change of coordinates (cf. the construction 



in Sect. 4.3). 



The main result can now be stated as follows. 



Theorem 5.8 ( pT|[T5| ). Assume Setup^or^ Let L = 1. For any v > and jl>0, 
the shearlet frame SH{^, yr, yA;c) prov ides optimally sparse approximations of 
functions f E (§'l(M?) in the sense ofDfn. 



3.1 



I.e., 



\\f-fN\\l2<N-\\ogNy 



as N ^ oa, 



and 



K\<n \logn) 



as n 



where c = {(/, \j/x) ■ X E A,yf = W:¥ = ¥: or\j/=\}f} and c* = (c*),,^^ is a de- 
creasing (in modulus) rearrangement ofc. 



We now give a sketch of proof for this theorem, and refer to [ 11 ,15 1 for detailed 
proofs. 



5.2.3 Sketch of Proof of Theorem |5j 



Band-Umited Shearlets The proof of Thm. 5.8 for band-limited shearlets follows 



the same steps as discussed in Sect. 5.1.5 for the 2D case. To indicate the main 
steps, we will use the same notation as for the 2D proof with the straightforward 
extension to 3D. 



Similar to Thm. 5.3 and 5.4, one can prove the following results on the sparsity 
of the shearlets coefficients for each dyadic square Q E ^j. 

Theorem 5.9 (fu\). Let f E ^^{M?). QE with j > fixed, the sequence of 
shearlet coefficients {dx := (/g, yfx) : A E Aj} obeys 

Theorem 5.10 ( Let f E <g^{R^). For Q E with j > Q fixed, the sequence 
of shearlet coefficients {dx '■= (/g, ^x) '■ ^ ^ ^j} obeys 



Wx)xeAjh^<2-'^- 
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The proofs of Thm. |5 .91 and |5.10| follow the same principles as the proofs of the 



analog results in 2D, Thm. 5.3 and 5.4 with one important difference: In the proof 



of Thm. 5.3 and 5.4 the Radon transform (cf. (5.13 1) is used to deduce estimates for 



the integral of edge-curve fragments. In 3D one needs to use a different transform, 
namely the so-called X-ray transform, which maps a function on M? into the sets 
of its line integrals. The X-ray transform is then used to deduce estimates for the 



integral of the surface fragments. We refer to [ 1 1 1 for a detailed exposition. 
As a consequence of Thm. 5.9 and |5.10[ we have the following result. 



Theorem 5.11 ( 1 11 1). Suppose f G ^(M^). Then, for j > 0, the sequence of the 
shearlet coefficients {cx '■= {f,Vx) '■ ^ ^ ^j} obeys 

\\{cx)ieAj\\wfi ~ 1- 



Proof. The result follows by the same arguments used in the proof of Thm. 5.5 □ 



By Thm. |5.1 1 we can now prove Thm. 5.8 for the band-limited setup and for 



/ e with L = 1. The proof is very similar to the proof of Thm. 

Sect. |5.1.5[ wherefore we will not repeat it. 



5.1 



in 



Compactly Supported Shearlets In this section we will consider the key esti- 
mates for the linearized term for compactly supported shearlets in 3D. This is an 



extension of Thm. 5.6 to the three-dimensional setting. Hence, we will assume that 
the discontinuity surface is a plane, and consider the decay of the shearlet coeffi- 
cients of shearlets interacting with such a discontinuity. 



Theorem 5.12 ( [ 15 1). Let G L^(]R^) be compactly supported, and assume that 
Xjf satisfies the conditions in Setup |4] Further, let A be such that supp xir^ndB ^ 0. 
Suppose that f G <S'^(E?) and that dB is linear on the support of i//^ in the sense 
that 

supp 

for some ajfine hyperplane ofB?. Then, 
(i) if ^ has normal vector {^ — \,s\,S'i) with s\<'i and S2 < 3, 



I(/,V^a)I < min 



2-' 



!=1,2 



\ki + Vl^Si\ 



(ii) if has normal vector { — I, Si, S2) with s I > 3/2 or 52 > 3/2, 
(Hi) if ^ has normal vector {f),s\,S2) with s\,S2 G M, then 

I(/,v^a)I<2-3-', 
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Proof. Fix A, and let / G ^^(IR-^). We first consider the case (ii) and assume si > 
3/2. The hyperplane can be written as 

Jif={xeR^: (jc-xo, (-1,^1,^2)) = 0} 

for some G . For f 3 G M, we consider the restriction of J^f to the slice ;C3 = ^3 . 
This is clearly a line of the form 

^ = {x= {xi,X2) G : {x-x'o, (-l,5i)) = O} 

for some Xq G M?, hence we have reduced the singularity to a line singularity, which 
was already considered in Thm. |5.6[ We apply now Thm. |5.6| to each on slice, and 
we obtain 

|(/,V/a)I<2^-/42-9;/42-;/2 = 2-5;/2. 
The first term 2-'/^ in the estimate above is due to the different normalization factor 



used for shearlets in 2D and 3D, the second term is the conclusion from Thm. 5.6 



and the third is the length of the support of x^r^ in the direction of X3. The case 
S2 > 3/2 can be handled similarly with restrictions to slices X2 = X2 for X2 G M. 
This completes the proof of case (ii). 

The other two cases, i.e., case (i) and (ii), are proved using the same slice by 
shce technique and Thm. |5.6[ □ 



Neglecting truncated estimates, Thm. 5.12 can be used to prove the optimal 



sparsity result in Thm. 5.8 The argument is similar to the one in Sect. 5.1.6 and 
will not be repeated here. Let us simply argue that the decay rate | (/, V^a ) I ^ 2^^-'/^ 



from Thm. 5.12 li) is what is needed in the case Si > 3/2. It is easy to see that in 



3D an estimate of the form 



#|A(e) 



will guarantee optimal sparsity. Since we in the estimate |(/, VX)! ~ 2^^-^/^ have 
no control of the shearing parameter k = (^1,^2), we have to use a crude counting 
estimate, where we include all shears at a given scale j, namely 2-'/^ • 2-'/^ = 2-'. 
Since the number of dyadic boxes Q where dB intersects the support of / is of 
order 2-^'/^, we arrive at 



ilog2(e-') 

#|A(e)|< £ 25^-/ 



5.2.4 Some Extensions 



Paralleling the tw o-dim ensional setting (see Sect. 5.1.7| ), we can extend the optimal 
ity result in Thm. 5.8 to the cartoon-like image class £'^(M?) for L G N, in which 



to the cartoon-like image class 
the discontinuity surface dB is allowed to be piecewise smooth. 

Moreover, the requirement that the 'edge' dB is piecewise might be too 



restrictive in some applications. Therefore, in [15|, the cartoon-like image model 
class was enlarged to allow less regular images, where dB is piecewise C" smooth 
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for I < a <2, and not necessarily a C^. This class S"^ j^(E?) was introduced in 

Sect. [2] consisting of generalized cartoon-like images having smoothness apart 
from a piecewise C" discontinuity curve. The sparsity results presented above in 
Thm. |5.8| can be extended to this generalized model class for compactly supported 
shearlets with a scaling matrix dependent on a. The optimal approximation error 
rate, as usual measured in ||/ — /a? ||^2, for this generalized model is N^"l^\ compare 
this to A^^^ for the case a = 2 considered throughout this chapter. For brevity we 
will not go into details of this, but mention the approximation error rate obtained 
by shearlet frames is slightly worse than in the a = /3 = 2 case, since the error rate 
is not only a poly-log factor away from the optimal rate, but a small polynomial 
factor; and we refer to [15] the precise statement and proof. 



5.2.5 Surprising Observations 

Capturing anisotropic phenomenon in 3D is somewhat different from capturing 



anisotropic features in 2D as discussed in Sect. 1.3 While in 2D we 'only' have to 
handle curves, in 3D a more complex situation can occur since we find two geomet- 
rically very different anisotropic structures: curves and surfaces. Curves are clearly 
one-dimensional anisotropic features and surfaces two-dimensional features. Since 
our 3D shearlet elements are plate-like in spatial domain by construction, one could 
think that these 3D shearlet systems would only be able to efficiently capture two- 
dimensional anisotropic structures, and not one-dimensional structures. Nonethe- 



less, surprisingly, as we have discussed in Sect. 5.2.4[ these 3D shearlet systems still 



perform optimally when representing and analyzing 3D data S'^iM?) that contain 
both curve and surface singularities (see e.g.. Fig. [2]). 
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